## A Logical Representation of Data Flow

**Introduction:** To this point, our representation of data-flow problems and solutions can be termed "set-theoretic." That is, we represent information as sets and compute results using operators like union and intersection. For instance, when we introduced the reaching-definitions problem in Section 9.2.4, we computed IN[B] and OUT[J3] for a block B, and we described these as sets of definitions. We represented the contents of the block B by its gen and kill sets.

To cope with the complexity of interprocedural analysis, we now introduce a more general and succinct notation based on logic. Instead of saying something like "definition D is in IN[£?]," we shall use a notation like in(B,D) to mean the same thing. Doing so allows us to express succinct "rules" about inferring program facts. It also allows us to implement these rules efficiently, in a way that generalizes the bit-vector approach to set-theoretic operations. Finally, the logical approach allows us to combine what appear to be several independent analyses into one, integrated algorithm. For example, in Section 9.5 we described partial-redundancy elimination by a sequence of four data-flow analyses and two other intermediate steps. In the logical notation, all these steps could be combined into one collection of logical rules that are solved simultaneously.

**Introduction to Datalog: **Catalog is a language that uses a Prolog-like notation, but whose semantics is far simpler than that of Prolog. To begin, the elements of Datalog are atoms of the form p(X1,X2,... , Xn). Here,

1. p is a predicate — a symbol that represents a type of statement such as "a definition reaches the beginning of a block."

2. Xi,X2,... ,Xn are terms such as variables or constants. We shall also allow simple expressions as arguments of a predicate.2

A ground atom is a predicate with only constants as arguments. Every ground atom asserts a particular fact, and its value is either true or false. It is often convenient to represent a predicate by a relation, or table of its true ground atoms. Each ground atom is represented by a single row, or tuple, of the relation. The columns of the relation are named by attributes, and each tuple has a component for each attribute. The attributes correspond to the components of the ground atoms represented by the relation. Any ground atom in the relation is true, and ground atoms not in the relation are false.

**Datalog Rules: **Rules are a way of expressing logical inferences. In Datalog, rules also serve to suggest how a computation of the true facts should be carried out. The form of a rule is

H : - Bi k B2 k • • • & Bn

**The components are as follows:**

H and Bi, B2, • •. ,Bn are literals — either atoms or negated atoms.

H is the head and Bi,B2,..- ,Bn form the body of the rule.

Each of the IV s is sometimes called a subgoal of the rule.

We should read the: - symbol as "if." The meaning of a rule is "the head is true if the body is true." More precisely, we apply a rule to a given set of ground atoms as follows. Consider all possible substitutions of constants for the variables of the rule. If this substitution makes every subgoal of the body true (assuming that all and only the given ground atoms are true), then we can infer that the head with this substitution of constants for variables is a true fact. Substitutions that do not make all subgoals true give us no information; the head may or may not be true.

A Datalog program is a collection of rules. This program is applied to "data," that is, to a set of ground atoms for some of the predicates. The result of the program is the set of ground atoms inferred by applying the rules until no more inferences can be made.

**Intensional and Extensional Predicates: **It is conventional in Datalog programs to distinguish predicates as follows:

1. EDB, or extensional database, predicates are those that are defined apriori. That is, their true facts are either given in a relation or table, or they are given by the meaning of the predicate (as would be the case for a comparison predicate, e.g.).

2. IDB, or intensional database, predicates are defined only by the rules.

A predicate must be IDB or EDB, and it can be only one of these. As a result, any predicate that appears in the head of one or more rules must be an IDB predicate. Predicates appearing in the body can be either IDB or EDB. For instance, in Example 12.12, edge is an EDB predicate and path is an IDB predicate. Recall that we were given some edge facts, such as edge(l,2), but the path facts were inferred by the rules.

When Datalog programs are used to express data-flow algorithms, the EDB predicates are computed from the flow graph itself. IDB predicates are then expressed by rules, and the data-flow problem is solved by inferring all possible IDB facts from the rules and the given EDB facts.