## Scheduling Cyclic Dependence Graphs

**Introduction:** Dependence cycles complicate software pipelining significantly. When scheduling operations in an acyclic graph in topological order, data dependences with scheduled operations can impose only a lower bound on the placement of each operation. As a result, it is always possible to satisfy the data-dependence constraints by delaying operations. The concept of "topological order" does not apply to cyclic graphs. In fact, given a pair of operations sharing a cycle, placing one operation will impose both a lower and upper bound on the placement of the second.

Let ni and n2 be two operations in a dependence cycle, 5 be a software pipeline schedule, and T be the initiation interval for the schedule. A dependence edge m -»• n2 with label (6i,di) imposes the following constraint on S(ni) and S(n2):

(S_{1} xT) S(n_{2})-S(_{ni}) >di.

Similarly, a dependence edge ( n i , n 2 ) with label (52,d2) imposes constraint

(S_{2}xT) S(m)-S(n_{2})>d_{2}.

**Thus** S(_{ni}) d_{x}~ (51 x T) < S(n_{2}) < S(_{ni}) ~d_{2} {d_{2} x T).

A strongly connected component (SCC) in a graph is a set of nodes where every node in the component can be reached by every other node in the component. Scheduling one node in an SCC will bound the time of every other node in the component both from above and from below. Transitively, if there exists a path p leading from n\ to n 2 , then

**Observe that**

- Around any cycle, the sum of the S's must be positive. If it were 0 or negative, then it would say that an operation in the cycle either had to precede itself or be executed at the same clock for all iterations.
- The schedule of operations within an iteration is the same for all iterations; that requirement is essentially the meaning of a "software pipeline." As a result, the sum of the delays (second components of edge labels in a data-dependence graph) around a cycle is a lower bound on the initiation interval T.

When we combine these two points, we see that for any feasible initiation interval T, the value of the right side of Equation (10.1) must be negative or zero. As a result, the strongest constraints on the placement of nodes is obtained from the simple paths — those paths that contain no cycles.

Thus, for each feasible T, computing the transitive effect of data dependences on each pair of nodes is equivalent to finding the length of the longest simple path from the first node to the second. Moreover, since cycles cannot increase the length of a path, we can use a simple dynamic-programming algorithm to find the longest paths without the "simple-path" requirement, and be sure that the resulting lengths will also be the lengths of the longest simple paths (see Exercise 10.5.7).

**Algorithm**: **Software pipelining**.

**INPUT**: A machine-resource vector R = [ri,r2:...], where r« is the number of units available of the ith kind of resource, and a data-dependence graph G = (N,E). Each operation n in N is labeled with its resource-reservation table RTn] each edge e = ri\ -> n2 in E is labeled with (Se,de) indicating that n2 must execute no earlier than de clocks after node n\ from the 5eth preceding iteration.

**OUTPUT:** A software-pipelined schedule S and an initiation interval T.

**METHOD:**Execute the program in Fig. 10.29. Above Algorithm has a high-level structure similar to that of Algorithm Software pipelining an acyclic dependence graph, which only handles acyclic graphs. The minimum initiation interval in this case is bounded not just by resource requirements, but also by the data-dependence cycles in the graph. The graph is scheduled one strongly connected component at a time. By treating each strongly connected component as a unit, edges between strongly connected components necessarily form an acyclic graph. While the top-level loop in Algorithm Software pipelining an acyclic dependence graph schedules nodes in the graph in topological order, the top-level loop in above Algorithm schedules strongly connected components in topological order. As before, if the algorithm fails to schedule all the components, then a larger initiation interval is tried.