## Symbolic Analysis

**Introduction:** We use symbolic analysis to illustrate the use of region based analysis. In this analysis, we track the values of variables in programs symbolically as expressions of input variables and other variables, which we call reference variables. Expressing variables in terms of the same set of reference variables draws out their relationships. Symbolic analysis can be used for a range of purposes such as optimization, parallelization, and analyses for program understanding.

**Example:** Consider the simple example program in Fig. 9.54. Here, we use x as the sole reference variable. Symbolic analysis will find that y has the value x — 1 and z has the value x — 2 after their respective assignment statements in lines (2) and (3). This information is useful, for example, in determining that the two assignments in lines (4) and (5) write to different memory locations and can thus be executed in parallel. Furthermore, we can tell that the condition z > x is never true, thus allowing the optimizer to remove the conditional statement in lines (6) and (7) all together.

**Affine Expressions of Reference Variables: **Since we cannot create succinct and closed-form symbolic expressions for allvalues computed, we choose an abstract domain and approximate the computationswith the most precise expressions within the domain. We have alreadyseen an example of this strategy before: constant propagation. In constantpropagation, our abstract domain consists of the constants, an UNDEF symbolif we have not yet determined if the value is a constant, and a special NACsymbol that is used whenever a variable is found not to be a constant.

The symbolic analysis we present here expresses values as affine expressions of reference variables whenever possible. An expression is affine with respect to variables i>2 , . . . ,vn if it can be expressed as Co c\V\ . . . cnvn, where are constants. Such expressions are informally known as linear expressions. Strictly speaking, an affine expression is linear only if CQ is zero. We are interested in affine expressions because they are often used to index arrays in loops—such information is useful for optimizations and parallelization.

**Induction Variables: **Instead of using program variables as reference variables, an affine expressioncan also be written in terms of the count of iterations through the loop. Variableswhose values can be expressed as cii c 0, where i is the count of iterationsthrough the closest enclosing loop, are known as induction variables.

**Example:** Consider the code fragment

- for (m = 10; m < 20; m )
- { x = m*3; A[x] = 0 ; }

Suppose we introduce for the loop a variable, say i, representing the number of iterations executed. The value i is 0 in the first iteration of the loop, 1 in the second, and so on. We can express variable m as an affine expression of i, namely m — i 10. Variable x, which is 3m, takes on values 30,33,... ,57 during successive iterations of the loop. Thus, x has the affine expression x = 30 Si. We conclude that both m and x are induction variables of this loop.

Expressing variables as affine expressions of loop indexes makes the series of values being computed explicit and enables several transformations. The series of values taken on by an induction variable can be computed with additions rather than multiplications. This transformation is known as "strength reduction" and was introduced in Sections 8.7 and 9.1. For instance, we can eliminate the multiplication x=m*3 from the loop of Example 9.57 by rewriting the loop as

- x = 27;
- for (m = 10; m < 20; m )
- { x = x 3; A[x] = 0 ; }

In addition, notice that the locations assigned 0 in that loop, SzA S0, &,A 33,. . . , & A 57, are also affine expressions of the loop index. In fact, this series of integers is the only one that needs to be computed; We need neither m nor x. The code above can be replaced simply by:

- f o r (x = &A 30; x <= &A 57; x = x 3)
- *x = 0;

Besides speeding up the computation, symbolic analysis is also useful for parallelization. When the array indexes in a loop are affine expressions of loop indexes, we can reason about relations of data accessed across the iterations. For example, we can tell that the locations written are different in each iteration and therefore all the iterations in the loop can be executed in parallel on different processors. Such information is used in Chapters 10 and 11 to extract parallelism from sequential programs.

**Other Reference Variables: **If a variable is not a linear function of the reference variables already chosen, we have the option of treating its value as reference for future operations. For example, consider the code fragment:

While the value held by a after the function call cannot itself be expressed as a linear function of any reference variables, it can be used as reference for subsequent statements. For example, using a as a reference variable, we can discover that c is one larger than b at the end of the program.

- a = 0;
- for (f = 100; f < 200; f ) {
- a = a 1 • >
- b = 10 * a;
- c = 0;
- f or (g = 10; g <
- d = b c;
- c = c 1;
- }
- }

**Figure 9.55: Source code for Example 9.58**