## Region-Based Analysis

**Introduction**: The iterative data-flow analysis algorithm we have discussed so far is just one approach to solving data-flow problems. Here we discuss another approach called region-based analysis. Recall that in the iterative-analysis approach, we create transfer functions for basic blocks, and then find the fixed-point solution by repeated passes over the blocks. Instead of creating transfer functions just for individual blocks, a region-based analysis finds transfer functions that summarize the execution of progressively larger regions of the program. Ultimately, transfer functions for entire procedures are constructed and then applied, to get the desired data-flow values directly.

While a data-flow framework using an iterative algorithm is specified by a semi-lattice of data-flow values and a family of transfer functions closed under composition, region-based analysis requires more elements. A region-based framework includes both a semi-lattice of data-flow values and a semi-lattice of transfer functions that must possess a meet operator, a composition operator, and a closure operator. We shall see what all these elements entail in Section 9.7.4.

A region-based analysis is particularly useful for data-flow problems where paths that have cycles may change the data-flow values. The closure operator allows the effect of a loop to be summarized more effectively than does iterative analysis. The technique is also useful for inter-procedural analysis, where transfer functions associated with a procedure call may be treated like the transfer functions associated with basic blocks.

For simplicity, we shall consider only forward data-flow problems in this section. We first illustrate how region-based analysis works by using the familiar example of reaching definitions. In Section 9.8 we show a more compelling use of this technique, when we study the analysis of induction variables.

**Regions: **In region-based analysis, a program is viewed as a hierarchy of regions, whichare (roughly) portions of a flow graph that have only one point of entry. Weshould find this concept of viewing code as a hierarchy of regions intuitive,because a block-structured procedure is naturally organized as a hierarchy ofregions. Each statement in a block-structured program is a region, as controlflow can only enter at the beginning of a statement. Each level of statementnesting corresponds to a level in the region hierarchy.Formally, a region of a flow graph is a collection of nodes N and edges Esuch that

1. There is a header h in N that dominates all the nodes in N.

2. If some node m can reach a node n in N without going through h, then m is also in N.

3. *E *is the set of all the control flow edges between nodes *n\ *and *n2 in N, *except (possibly) for some that enter *h**.*

**Example:**Clearly a natural loop is a region, but a region does not necessarily have a back edge and need not contain any cycles. For example, in Fig. 9.47, nodes *B\ *and *B2*, together with the edge *B\ —> B2: *form a region; so do nodes *Bi,B2, *and *B3 *with edges *Bx B2, B2 ->• B3, *and *Bi -> B3.* However, the subgraph with nodes *B2 *and **#3 **with edge *B2 —> B3 *does not form a region, because control may enter the subgraph at both nodes *B2 *and *B3. *More precisely, neither *B2 *nor *B3 *dominates the other, so condition (1) for a region is violated. Even if we picked, say, *B2 *to be the "header," we would violate condition (2), since we can reach *B3 *from *B\ *without going through *B2,* and J5i is not in the "region."

**Region Hierarchies for Reducible Flow Graphs: **In what follows, we shall assume the flow graph is reducible. If occasionally wemust deal with non-reducible flow graphs, then we can use a technique called"node splitting" that will be discussed in Section 9.7.6.To construct a hierarchy of regions, we identify the natural loops. Recallfrom Section 9.6.6 that in a reducible flow graph, any two natural loops areeither disjoint or one is nested within the other. The process of "parsing" areducible flow graph into its hierarchy of loops begins with every block as aregion by itself. We call these regions leaf regions. Then, we order the naturalloops from the inside out, i.e., starting with the innermost loops. To process aloop, we replace the entire loop by a node in two steps:

1. First, the body of the loop L (all nodes and edges except the back edges to the header) is replaced by a node representing a region R. Edges to the header of L now enters the node for R. An edge from any exit of loop L is replaced by an edge from R to the same destination. However, if the edge is a back edge, then it becomes a loop on R. We call R a body region.

*2. *Next, we construct a region *R' *that represents the entire natural loop *L. *We call *R' *a *loop region. *The only difference between *R *and *R' *is thatthe latter includes the back edges to the header of loop *L. *Put anotherway, when *R' *replaces *R *in the flow graph, all we have to do is removethe edge from *R *to itself.

We proceed this way, reducing larger and larger loops to single nodes, first witha looping edge and then without. Since loops of a reducible flow graph arenested or disjoint, the loop region's node can represent all the nodes of thenatural loop in the series of flow graphs that are constructed by this reductionprocess.

Eventually, all natural loops are reduced to single nodes. At that point, the flow graph may be reduced to a single node, or there may be several nodes remaining, with ho loops; i.e., the reduced flow graph is an acyclic graph of more than one node. In the former case we are done constructing the region hierarchy, while in the latter case, we construct one more body region for the entire flow graph.

**Algorithm:**Constructing a bottom-up order of regions of a reducible flow graph.

**INPUT:**A reducible flow graph G.

**OUTPUT:**A list of regions of G that can be used in region-based data-flow problems.

**METHOD:**

1. Begin the list with all the leaf regions consisting of single blocks of G, in any order.

2. Repeatedly choose a natural loop L such that if there are any natural loops contained within L, then these loops have had their body and loop regions added to the list already. Add first the region consisting of the body of L (i.e., L without the back edges to the header of L), and then the loop region of L.

3. If the entire flow graph is not itself a natural loop, add at the end of the list the region consisting of the entire flow graph.