A Simple Pointer-Analysis Algorithm
Introduction: In this section, we begin to show in subsequent sections how to handle procedures first context insensitively, then context sensitively. Flow sensitivity adds a lot of complexity, and is less important to context sensitivity for languages like Java where methods tend to be small.
The fundamental question that we wish to ask in pointer-alias analysis is whether a given pair of pointers may be aliased. One way to answer this question is to compute for each pointer the answer to the question "what objects can this pointer point to?" If two pointers can point to the same object, then the pointers may be aliased.
Pointer Analysis’s Difficulties: Pointer-alias analysis for C programs is particularly difficult, because C programs can perform arbitrary computations on pointers. In fact, one can read in an integer and assign it to a pointer, which would render this pointer a potential alias of all other pointer variables in the program. Pointers in Java, known as references, are much simpler. No arithmetic is allowed, and pointers can only point to the beginning of an object.
Pointer-alias analysis must be interprocedural. Without interprocedural analysis, one must assume that any method called can change the contents of all accessible pointer variables, thus rendering any intraprocedural pointer-alias analysis ineffective.
Languages allowing indirect function calls present an additional challenge for pointer-alias analysis. In C, one can call a function indirectly by calling a dereferenced function pointer. We need to know what the function pointer can point to before we can analyze the function called. And clearly, after analyzing the function called, one may discover more functions that the function pointer can point to, and therefore the process needs to be iterated.
While most functions are called directly in C, virtual methods in Java cause many invocations to be indirect. Given an invocation x.m() in a Java program, there may be many classes to which object x might belong and that have a method named m. The more precise our knowledge of the actual type of x, the more precise our call graph is. Ideally, we can determine at compile time the exact class of x and thus know exactly which method m refers to.
Example: Consider the following sequence of Java statements
= new String ( );
n = o . length O;
Here o is declared to be an Object. Without analyzing what o refers to, all possible methods called "length" declared for all classes must be considered as possible targets. Knowing that o points to a S t r i n g will narrow interprocedural analysis to precisely the method declared for String.
It is possible to apply approximations to reduce the number of targets. For example, statically we can determine what all the types of objects created are, and we can limit the analysis to those. But we can be more accurate if we can discover the call graph on the fly, based on the points-to analysis obtained at the same time. More accurate call graphs lead not only to more precise results but also can reduce greatly the analysis time otherwise needed. Points-to analysis is complicated. It is not one of those "easy" data flow problems where we only need to simulate the effect of going around a loop of statements once. Rather, as we discover new targets for a pointer, all statements assigning the contents of that pointer to another pointer need to be re-analyzed.
For simplicity, we shall focus mainly on Java. We shall start with flow insensitive and context-insensitive analysis, assuming for now that no methods are called in the program. Then, we describe how we can discover the call graph on the fly as the points-to results is computed. Finally, we describe one way of handling context sensitivity.
A Model for Pointers and References: Let us suppose that our language has the following ways to represent and manipulatereferences:
1. Certain program variables are of type "pointer to T" or "reference to T," where T is a type. These variables are either static or live on the run-time stack. We call them simply variables.
2. There is a heap of objects. All variables point to heap objects, not to other variables. These objects will be referred to as heap objects.
3. A heap object can have fields, and the value of a field can be a reference to a heap object (but not to a variable).
Java is modeled well by this structure, and we shall use Java syntax in examples. Note that C is modeled less well, since pointer variables can point to other pointer variables in C, and in principle, any C value can be coerced into a pointer.
Since we are performing an insensitive analysis, we only need to assert that a given variable v can point to a given heap object h; we do not have to address the issue of where in the program v can point to h, or in what contexts v can point to h. Note, however, that variables can be named by their full name. In Java, this name can incorporate the module, class, method, and block within a method, as well as the variable name itself. Thus, we can distinguish many variables that have the same identifier.
Heap objects do not have names. Approximation often is used to name the objects, because an unbounded number of objects may be created dynamically. One convention is to refer to objects by the statement at which they are created. As a statement can be executed many times and create a new object each time, an assertion like "v can point to /i" really means "v can point to One or more of the objects created at the statement labeled h."
The goal of the analysis is to determine what each variable and each field of each heap object can point to. We refer to this as points-to analysis; two pointers are aliased if their points-to sets intersect. We describe here an inclusion-based analysis; that is, a statement such as v = w causes variable v to point to all the objects w points to, but not vice versa. While this approach may seem obvious, there are other alternatives to how we define points-to analysis. For example, we can define an equivalence-based analysis such that statements like v = w would turn variables v and w into one equivalence class, pointing to all the variables that each can point to. While this formulation does not approximate aliases well, it provides a quick, and often good, answer to the question of which variables point to the same kind of objects.