Flow Insensitivity
Introduction: We start by showing a very simple example to illustrate the effect of ignoring control flow in points-to analysis.
Example: In Fig. 12.20, three objects, h, i, and j, are created and assigned to variables a, b, and c, respectively. Thus, surely a points to h, b points to i, and c points to j by the end of line (3). If you follow the statements (4) through (6), you discover that after line (4) a points only to i. After line (5), b points only to j, and after line (6), c points only to i.
The above analysis is flow sensitive because we follow the control flow and compute what each variable can point to after each statement. In other words, in addition to considering what points-to information each statement "generates," we also account for what points-to information each statement "kills."
For instance, the statement b = c; kills the previous fact "6 points to f and generates the new relationship "6 points to what c points to." A flow-insensitive analysis ignores the control flow, which essentially assumes that every statement in the program can be executed in any order. It computes only one global point-to map indicating what each variable can possibly point to at any point of the program execution. If a variable can point to two different objects after two different statements in a program, we simply record that it can point to both objects. In other words, in flow-insensitive analysis, an assignment does not "kill" any points-to relations but can only "generate" more points-to relations. To compute the flow-insensitive results, we repeatedly add the points to effects of each statement on the points-to relationships until no new relations are found. Clearly, lack of flow sensitivity weakens the analysis results greatly, but it tends to reduce the size of the representation of the results and make the algorithm converge faster.
The Formulation in Datalog: Let us now formalize a flow-insensitive pointer-alias analysis based on the discussion above. We shall ignore procedure calls for now and concentrate on the four kinds of statements that can affect pointers:
1. Object creation, h: T v = new T ( ) ; This statement creates a new heap object, and variable v can point to it.
2. Copy statement, v = w; Here, v and w are variables. The statement makes v point to whatever heap object w currently points to; i.e., w is copied into v.
3. Field store, v.f = w; the type of object that v points to must have a field /, and this field must be of some reference type. Let v point to heap object h, and let w point to g. This statement makes the field /, in h now point to g. Note that the variable v is unchanged.
4. Field load, v = w. f; Here, if is a variable pointing to some heap object that has a field /, and / points to some heap object h. The statement makes variable v point to h.
Note that compound field accesses in the source code such as v = w. f. g will be broken down into two primitive field-load statements:
vl = w. f;
v = vl.g;
Let us now express the analysis formally in Datalog rules. First, there are only two IDB predicates we need to compute:
1. pts(V,H) means that variable V can point to heap object H.
2. hpts(H, F, G) means that field F of heap object H can point to heap object G.
The EDB relations are constructed from the program itself. Since the location of statements in a program is irrelevant when the analysis is flow insensitive, we only have to assert in the EDB the existence of statements that have certain forms. In what follows, we shall make a convenient simplification. Instead of defining EDB relations to hold the information garnered from the
program, we shall use a quoted statement form to suggest the EDB relation or relations that represent the existence of such a statement. For example, "H : TV = new T" is an EDB fact asserting that at statement H there is an assignment that makes variable V point to a new object of type T. We assume that in practice, there would be a corresponding EDB relation that would be populated with ground atoms, one for each statement of this form in the program. With this convention, all we need to write the Datalog program is one rule for each of the four types of statements. The program is shown in Fig, 12.21. Rule (1) says that variable V can point to heap object H if statement H is an assignment of a new object to V. Rule (2) says that if there is a copy statement V = W, and W can point to H, then V can point to H.
Rule (3) says that if there is a statement of the form V.F = W,W can point to C7, and V can point to H, then the F field of H can point to G. Finally, rule (4) says that if there is a statement of the form V = W.F, W can point to G, and the F field of G can point to H, then V can point to H. Notice that pts and hpts are mutually recursive, but this Datalog program can be evaluated by either of the iterative algorithms discussed in Section 12.3.4.