Short-Pause Garbage Collection
Introduction: Short-Pause Garbage Collection is Incremental by time and partial and generational by space. Simple trace-based collectors do stop-the-world-style garbage collection, which may introduce long pauses into the execution of user programs. We can reduce the length of the pauses by performing garbage collection one part at a time.
We can divide the work in time, by interleaving garbage collection with the mutation, or we can divide the work in space by collecting a subset of the garbage at a time. The former is known as incremental collection and the latter is known as partial collection.
An incremental collector breaks up the reachability analysis into smaller units, allowing the mutator to run between these execution units. The reachable set changes as the mutator executes, so incremental collection is complex.
The best known of partial-collection algorithms is generational garbage collection] it partitions objects according to how long they have been allocated and collects the newly created objects more often because they tend to have a shorter lifetime. An alternative algorithm, the train algorithm, also collects a subset of garbage at a time, and is best applied to more mature objects. These two algorithms can be used together to create a partial collector that handles younger and older objects differently.
Incremental Garbage Collection: Incremental collectors are conservative. While a garbage collector must not collect objects that are not garbage, it does not have to collect all the garbage in each round. We refer to the garbage left behind after collection as floating garbage. Of course it is desirable to minimize floating garbage. In particular, an incremental collector should not leave behind any garbage that was not reachable at the beginning of a collection cycle. If we can be sure of such a collection guarantee, then any garbage not collected in one round will be collected in the next, and no memory is leaked because of this approach to garbage collection.
In other words, incremental collectors play it safe by overestimating the set of reachable objects. They first process the program's root set atomically, without interference from the mutator. After finding the initial set of unscanned objects, the mutator's actions are interleaved with the tracing step. During this period, any of the mutator's actions that may change reachability are recorded succinctly, in a side table, so that the collector can make the necessary adjustments when it resumes execution. If space is exhausted before tracing completes, the collector completes the tracing process, without allowing the mutator to execute. In any event, when tracing is done, space is reclaimed atomically.
Precision of Incremental Collection: Once an object becomes unreachable, it is not possible for the object to become reachable again. Thus, as garbage collection and mutation proceed, the set of reachable objects can only
1. Grow due to new objects allocated after garbage collection starts, and
2. Shrink by losing references to allocated objects.
Let the set of reachable objects at the beginning of garbage collection be R; let New be the set of allocated objects during garbage collection, and let Lost be the set of objects that have become unreachable due to lost references since tracing began. The set of objects reachable when tracing completes is (R U New) - Lost.
It is expensive to reestablish an object's reachability every time a mutator loses a reference to the object, so incremental collectors do not attempt to collect all the garbage at the end of tracing. Any garbage left behind — floating garbage — should be a subset of the Lost objects. Expressed formally, the set S of objects found by tracing must satisfy
(R U New) - Lost C 5 C (R U New)
Simple Incremental Tracing: We first describe a straightforward tracing algorithm that finds the upper bound R U New. The behavior of the mutator is modified during the tracing as follows:
- All references that existed before garbage collection are preserved; that is, before the mutator overwrites a reference, its old value is remembered and treated like an additional unscanned object containing just that reference.
- All objects created are considered reachable immediately and are placed in the unscanned state.
This scheme is conservative but correct, because it finds R, the set of all the objects reachable before garbage collection, plus New, the set of all the newly allocated objects. However, the cost is high, because the algorithm intercepts all write operations and remembers all the overwritten references. Some of this work is unnecessary because it may involve objects that are unreachable at the end of garbage collection. We could avoid some of this work and also improve the algorithm's precision if we could detect when the overwritten references point to objects that are unreachable when this round of garbage collection ends. The next algorithm goes fairly far in these two directions.