## Reaching Definitions

**Introduction:** "Reaching definitions" is one of the most common and useful data-flow schemas. By knowing where in a program each variable x may have been defined when control reaches each point p, we can determine many things about x. For just two examples, a compiler then knows whether x is a constant at point p, and a debugger can tell whether it is possible for x to be an undefined variable, should x be used at p.

We say a definition d reaches a point p if there is a path from the point immediately following d to p, such that d is not "killed" along that path. We kill a definition of a variable x if there is any other definition of x anywhere along the path.3 Intuitively, if a definition d of some variable x reaches point p, then d might be the place at which the value of x used at p was last defined.

A definition of a variable x is a statement that assigns, or may assign, a value to x. Procedure parameters, array accesses, and indirect references all may have aliases, and it is not easy to tell if a statement is referring to a particular variable x. Program analysis must be conservative; if we do not know whether a statement s is assigning a value to x, we must assume that it may assign to it; that is, variable x after statement s may have either its original value before s or the new value created by s. For the sake of simplicity, the rest of the chapter assumes that we are dealing only with variables that have no aliases. This class of variables includes all local scalar variables in most languages; in the case of C and C , local variables whose addresses have been computed at some point are excluded.

**Example:** Shown in Fig. 9.13 is a flow graph with seven definitions. Let us focus on the definitions reaching block B2. All the definitions in block B\ reach the beginning of block B2. The definition d§: j = j - 1 in block B2 also reaches the beginning of block B2, because no other definitions of j can be found in the loop leading back to B2. This definition, however, kills the definition d2: j = n, preventing it from reaching B% or J34. The statement d4: i = i 1 in B2 does not reach the beginning of B2 though, because the variable i is always redefined by d^: i = u3. Finally, the definition d6 : a = u2 also reaches the beginning of block B2.

By defining reaching definitions as we have, we sometimes allow inaccuracies. However, they are all in the "safe," or "conservative," direction. For example, notice our assumption that all edges of a flow graph can be traversed. This assumption may not be true in practice. For example, for no values of a and b can the flow of control actually reach statement 2 in the following program fragment:

if (a == b) statement 1; else if (a == b) statement 2;

To decide in general whether each path in a flow graph can be taken is an undecidable problem. Thus, we simply assume that every path in the flow graph can be followed in some execution of the program. In most applications of reaching definitions, it is conservative to assume that a definition can reach a point even if it might not. Thus, we may allow paths that are never be traversed in any execution of the program, and we may allow definitions to pass through ambiguous definitions of the same variable safely.

**Transfer Equations for Reaching Definitions: **We shall now set up the constraints for the reaching definitions problem. We start by examining the details of a single statement. Consider a definition

ri: u = v w

Here, and frequently in what follows, is used as a generic binary operator. This statement "generates" a definition d of variable u and "kills" all the other definitions in the program that define variable u, while leaving the remaining incoming definitions unaffected. The transfer function of definition d thus can be expressed as

fd(x) = gend U (x - killd) (9.1)

Where g end = {d}, the set of definitions generated by the statement, and killd is the set of all other definitions of u in the program.

As discussed in Section 9.2.2, the transfer function of a basic block can be found by composing the transfer functions of the statements contained therein. The composition of functions of the form (9.1), which we shall refer to as "genkill form," is also of that form, as we can see as follows. Suppose there are two functions fi(x) = gen\ U (x - kill\) and f2(x) = gen2 U (x — kill2). Then

This rule extends to a block consisting of any number of statements. Suppose block B has n statements, with transfer functions fi(x) = geni U (x — kilh) for i = 1,2,... , n. Then the transfer function for block B may be written as: where

!B{X) = genB U (x - kills),

kills = kilh U kill2 U . . . U killn

and

gens — genn U (genn_i — killn) U (genn-2 — killn-\ — killn) U

. . . U (#eni — kill2 — kilh — . . — killn)

Thus, like a statement, a basic block also generates a set of definitions and kills a set of definitions. The gen set contains all the definitions inside the block that are "visible" immediately after the block — we refer to them as downwards exposed. A definition is downwards exposed in a basic block only if it is not "killed" by a subsequent definition to the same variable inside the same basic block. A basic block's kill set is simply the union of all the definitions killed by the individual statements. Notice that a definition may appear in both the gen and kill set of a basic block. If so, the fact that it is in gen takes precedence, because in gen-kill form, the kill set is applied before the gen set.