Context-Sensitive Pointer Analysis
Introduction: Context sensitivity can improve greatly the precision of interprocedural analysis. We talked about two approaches to interprocedural analysis, one based on cloning and one on summaries.
There are several difficulties in computing the summaries of points-to information. First, the summaries are large. Each method's summary must include the effect of all the updates that the function and all its callees can make, in terms of the incoming parameters. That is, a method can change the points-to sets of all data reachable through static variables, incoming parameters and all objects created by the method and its callees. While complicated schemes have been proposed, there is no known solution that can scale to large programs. Even if the summaries can be computed in a bottom-up pass, computing the points-to sets for all the exponentially many contexts in a typical top-down pass presents an even greater problem. Such information is necessary for global queries like finding all points in the code that touch a certain object.
In this section, we discuss a cloning-based context-sensitive analysis. A cloning-based analysis simply clones the methods, one for each context of interest. We then apply the context-insensitive analysis to the cloned call graph. While this approach seems simple, the devil is in the details of handling the large number of clones. How many contexts are there? Even if we use the idea of collapsing all recursive cycles, as discussed in Section 12.1.3, it is not uncommon to find 101 4 contexts in a Java application. Representing the results of these many contexts is the challenge.
We separate the discussion of context sensitivity into two parts:
1. How to handle context sensitivity logically? This part is easy, because we simply apply the context-insensitive algorithm to the cloned call graph.
2. How to represent the exponentially many contexts? One way is to represent the information as binary decision diagrams (BDD's), a highly optimized data structure that has been used for many other applications.
This approach to context sensitivity is an excellent example of the importance of abstraction. As we are going to show, we eliminate algorithmic complexity by leveraging the years of work that went into the BDD abstraction. We can specify a context-sensitive point-to analysis in just a few lines of Datalog, which in turn takes advantage of many thousands of lines of existing code for BDD manipulation. This approach has several important advantages. First, it makes possible the easy expression of further analyses that use the results of the points-to analysis. After all, the points-to results on their own are not interesting. Second, it makes it much easier to write the analysis correctly, as it leverages many lines of well-debugged code.
Contexts and Call Strings: The context-sensitive points-to analysis described below assumes that a callgraph has been already computed. This step helps make possible a compactrepresentation of the many calling contexts. To get the call graph, we first runa context-insensitive point-to analysis that computes the call graph on the fly,as discussed in Section 12.5. We now describe how to create a cloned call graph.
A context is a representation of the call string that forms the history of the active function calls. Another way to look at the context is that it is a summary of the sequence of calls whose activation records are currently on the run-time stack. If there are no recursive functions on the stack, then the call string - the sequence of locations from which the calls on the stack were made — is a complete representation. It is also an acceptable representation, in the sense that there is only a finite number of different contexts, although that number may be exponential in the number of functions in the program.
However, if there are recursive functions in the program, then the number of possible call strings is infinite, and we cannot allow all possible call strings to represent distinct contexts. There are various ways we can limit the number of distinct contexts. For example, we can write a regular expression that describes all possible call strings and convert that regular expression to a deterministic finite automaton, using the methods of Section 3.7. The contexts can then be identified with the states of this automaton.
Here, we shall adopt a simpler scheme that captures the history of non-recursive calls but considers recursive calls to be "too hard to unravel." We begin by finding all the mutually recursive sets of functions in the program. The process is simple and will not be elaborated in detail here. Think of a graph whose nodes are the functions, with an edge from p to q if function p calls function q. The strongly connected components (SCC's) of this graph are the sets of mutually recursive functions. As a common special function p that calls itself, but is not in an SCC with any other function is an SCC by itself. The non-recursive functions are also SCC's by them-selves. Call an SCC nontrivial if it either has more than one member (the mutually recursive case), or it has a single, recursive member. The SCC's that are single, nonrecursive functions are trivial SCC's.
Our modification of the rule that any call string is a context is as follows. Given a call string, delete the occurrence of a call site s if
1. s is in a function p.
2. Function q is called at site s (q = p is possible).
3. p and q are in the same strong component (i.e., p and q are mutually recursive, or p = q and p is recursive).
The result is that when a member of a nontrivial SCC S is called, the call site for that call becomes part of the context, but calls within S to other functions in the same SCC are not part of the context. Finally, when a call outside 5 is made, we record that call site as part of the context.