The Value-Number Method for Constructing DAG\'s
Introduction: The nodes of a syntax tree or DAG are stored in an array of records. Each row of the array represents one record, and therefore one node. In each record, the first field is an operation code, indicating the label of the node. In Fig. 6.6(b), leaves have one additional field, which holds the lexical value (either a symbol-table pointer or a constant, in this case), and interior nodes have two additional fields indicating the left and right children.
In this array, we refer to nodes by giving the integer index of the record for that node within the array. This integer historically has been called the value number for the node or for the expression represented by the node. For instance, in Fig. 6.6, the node labeled -I- has value number 3, and its left and right children have value numbers 1 and 2, respectively. In practice, we could use pointers to records or references to objects instead of integer indexes, but we shall still refer to the reference to a node as its "value number." If stored in an appropriate data structure, value numbers help us construct expression
DAG's efficiently; the next algorithm shows how.
Suppose that nodes are stored in an array, as in Fig. 6.6, and each node is referred to by its value number. Let the signature of an interior node be the triple (op,/,r), where op is the label, I its left child's value number, and r its right child's value number. A unary operator may be assumed to have r = 0-
Algorithm: The value-number method for constructing the nodes of a DAG.
INPUT: Label op, node /, and node r.
OUTPUT: The value number of a node in the array with signature (op, l,r).
METHOD: Search the array for a node M with label op, left child I, and right child r. If there is such a node, return the value number of M. If not, create in the array a new node N with label op, left child I, and right child r, and return its value number.
While Algorithm yields the desired output, searching the entire array every time we are asked to locate one node is expensive, especially if the array holds expressions from an entire program. A more efficient approach is to use a hash table, in which the nodes are put into "buckets," each of which typically will have only a few nodes. The hash table is one of several data structures that support dictionaries efficiently.1 A dictionary is an abstract data type that allows us to insert and delete elements of a set, and to determine whether a given element is currently in the set. A good data structure for dictionaries, such as a hash table, performs each of these operations in time that is constant or close to constant, independent of the size of the set.
To construct a hash table for the nodes of a DAG, we need a hash function h that computes the index of the bucket for a signature (op, I, r), in a way that distributes the signatures across buckets, so that it is unlikely that any one bucket will get much more than a fair share of the nodes. The bucket index h(op, I, r) is computed deterministically from op, I, and r, so that we may repeat the calculation and always get to the same bucket index for node (op, I, r).
The buckets can be implemented as linked lists, as in Fig. 6.7. An array, indexed by hash value, holds the bucket headers, each of which points to the first cell of a list. Within the linked list for a bucket, each cell holds the value number of one of the nodes that hash to that bucket. That is, node (op,l,r) can be found on the list whose header is at index h(op,l,r) of the array.
Thus, given the input node op, I, and r, we compute the bucket index h(op,l,r) and search the list of cells in this bucket for the given input node. Typically, there are enough buckets so that no list has more than a few cells. We may need to look at all the cells within a bucket, however, and for each value number v found in a cell, we must check whether the signature (op,l,r) of the input node matches the node with value number v in the list of cells (as in Fig. 6.7). If we find a match, we return v. If we find no match, we know no such node can exist in any other bucket, so we create a new cell, add it to the list of cells for bucket index h(op, l,r), and return the value number in that new cell.