Finding Dependences Among Memory Accesses
Introduction: To check if two memory accesses share data dependence, we only need to tell if they can refer to the same location; we do not need to know which location is being accessed. For example, we can tell that the two accesses *p and Op) 4 cannot refer to the same location, even though we may not know what p points to. Data dependence is generally un-decidable at compile time. The compiler must assume that operations may refer to the same location unless it can prove otherwise.
Example: Given the code sequence unless the compiler knows that p cannot possibly point to a, it must conclude that the three operations need to execute serially. There is an output dependence flowing from statement (1) to statement (2), and there are two true dependences flowing from statements (1) and (2) to statement (3).
Data-dependence analysis is highly sensitive to the programming language used in the program. For type-unsafe languages like C and C , where a pointer can be cast to point to any kind of object, sophisticated analysis is necessary to prove independence between any pair of pointer-based memory accesses. Even local or global scalar variables can be accessed indirectly unless we can prove that their addresses have not been stored anywhere by any instruction in the program. In type-safe languages like Java, objects of different types are necessarily distinct from each other. Similarly, local primitive variables on the stack cannot be aliased with accesses through other names.
A correct discovery of data dependences requires a number of different forms of analysis. We shall focus on the major questions that must be resolved if the compiler is to detect all the dependences that exist in a program, and how to use this information in code scheduling. Later chapters show how these analyses are performed.
- Array Data - Dependence Analysis: Array data dependence is the problem of disambiguating between the values ofindexes in array-element accesses. For example, the loop
- for (i = 0; i < n; i )
- A[2*i] = A[2*i 1] ;
- copies odd elements in the array A to the even elements just preceding them. Because all the read and written locations in the loop are distinct from each other, there are no dependences between the accesses, and all the iterations in the loop can execute in parallel. Array data-dependence analysis, often referred to simply as data-dependence analysis, is very important for the optimization of numerical applications. This topic will be discussed in detail in Section 11.6.
- Pointer-Alias Analysis: We say that two pointers are aliased if they can refer to the same object.Pointer-alias analysis is difficult because there are many potentially aliasedpointers in a program, and they can each point to an unbounded number ofdynamic objects over time. To get any precision, pointer-alias analysis must beapplied across all the functions in a program.
- Inter procedural Analysis: For languages that pass parameters by reference, inter procedural analysis isneeded to determine if the same variable is passed as two or more differentarguments. Such aliases can create dependences between seemingly distinctparameters. Similarly, global variables can be used as parameters and thuscreate dependences between parameter accesses and global variable accesses.Inter procedural analysis, discussed in Chapter 12, is necessary to determinethese aliases.
- Tradeoff between Register Usage and Parallelism: In this chapter we shall assume that the machine-independent intermediate representationof the source program uses an unbounded number of pseudo registersto represent variables that can be allocated to registers. These variables includescalar variables in the source program that cannot be referred to by any othernames, as well as temporary variables that are generated by the compiler tohold the partial results in expressions. Unlike memory locations, registers areuniquely named. Thus precise data-dependence constraints can be generatedfor register accesses easily.
The unbounded number of pseudo registers used in the intermediate representation must eventually be mapped to the small number of physical registers available on the target machine. Mapping several pseudo registers to the same physical register has the unfortunate side effect of creating artificial storage dependences that constrain instruction-level parallelism. Conversely, executing instructions in parallel creates the need for more storage to hold the values being computed simultaneously. Thus, the goal of minimizing the number of registers used conflicts directly with the goal of maximizing instruction-level parallelism. Examples 10.2 and 10.3 below illustrate this classic trade-off between storage and parallelism.
Example: The code below copies the values of variables in locations a and c to variables in locations b and d, respectively, using pseudo registers t1 and t2.
- LD t l , a // t l = a
- ST b , t l // b = t l
- LD t2, c // t2 = c
- ST d, t2 // d = t2
If all the memory locations accessed are known to be distinct from each other, then the copies can proceed in parallel. However, if tl and t2 are assigned the same register so as to minimize the number of registers used, the copies are necessarily serialized.
Example: Traditional register-allocation techniques aim to minimize the number of registers used when performing a computation. Consider the expression
(a b) c (d e)
shown as a syntax tree in Fig. 10.2. It is possible to perform this computation using three registers, as illustrated by the machine code in Fig. 10.3.
The reuse of registers, however, serializes the computation. The only operations allowed to execute in parallel are the loads of the values in locations a and b, and the loads of the values in locations d and e. It thus takes a total of 7 steps to complete the computation in parallel.
Had we used different registers for every partial sum, the expression could be evaluated in 4 steps, which is the height of the expression tree in Fig. 10.2. The parallel computation is suggested by Fig. 10.4.