Introduction to Trace-Based Collection
Introduction: Instead of collecting garbage as it is created, trace-based collectors run periodically to find unreachable objects and reclaim their space. Typically, we run the trace-based collector whenever the free space is exhausted or its amount drops below some threshold.
We begin this section by introducing the simplest "mark-and-sweep" garbage collection algorithm. We then describe the variety of trace-based algorithms in terms of four states that chunks of memory can be put in. This section also contains a number of improvements on the basic algorithm, including those in which object relocation is a part of the garbage-collection function.
A Basic Mark-and-Sweep Collector: Mark-and-sweep garbage-collection algorithms are straightforward, stop-the world algorithms that find all the unreachable objects, and put them on the list of free space. Following Algorithm visits and "marks" all the reachable objects in the first tracing step and then "sweeps" the entire heap to free up unreachable objects.
Following Algorithm, which we consider after introducing a general framework for trace-based algorithms, is an optimization of following Algorithm. By using an additional list to hold all the allocated objects, it visits the reachable objects only once.
Algorithm: Mark-and-sweep garbage collection.
INPUT: A root set of objects, a heap, and a free list, called Free, with all the unallocated chunks of the heap. All chunks of space are marked with boundary tags to indicate their free/used status and size.
OUTPUT: A modified free list after all the garbage has been removed.
METHOD: The algorithm, shown in Fig. 7.21, uses several simple data structures. List Free holds objects known to be free. A list called Unscanned, holds objects that we have determined are reached, but whose successors we have not yet considered. That is, we have not scanned these objects to see what other objects can be reached through them. The Unscanned list is empty initially.
Additionally, each object includes a bit to indicate whether it has been reached (the reached-bit). Before the algorithm begins, all allocated objects have the reached-bit set to 0.
In line (1) of Fig. 7.21, we initialize the Unscanned list by placing there all the objects referenced by the root set. The reached-bit for these objects is also set to 1. Lines (2) through (7) are a loop, in which we, in turn, examine each object o that is ever placed on the Unscanned list. The for-loop of lines (4) through (7) implements the scanning of object o.
We examine each object d for which we find a reference within o. If d has already been reached (its reached-bit is 1), then there is no need to do anything about d; it either has been scanned previously, or it is on the Unscanned list to be scanned later. However, if d was not reached already, then we need to set its reached-bit to 1 in line (6) and add d to the Unscanned list in line (7). Figure 7.22 illustrates this process. It shows an Unscanned list with four objects. The first object on this list, corresponding to object o in the discussion above, is in the process of being scanned. The dashed lines correspond to the three kinds of objects that might be reached from o:
1. A previously scanned object that need not be scanned again.
2. An object currently on the Unscanned list.
3. An item that is reachable, but was previously thought to be unreached.
Lines (8) through (11), the sweeping phase, reclaim the space of all the objects that remain unreached at the end of the marking phase. Note that these will include any objects that were on the Free list originally. Because the set of unreached objects cannot be enumerated directly, the algorithm sweeps through the entire heap. Line (10) puts free and unreached objects on the Free list, one at a time. Line (11) handles the reachable objects. We set their reached-bit to 0, in order to maintain the proper preconditions for the next execution of the garbage-collection algorithm.