Conditional Statements and Hardware Support for Software Pipelining
Introduction: If predicated instructions are available, we can convert control-dependent instructions into predicated ones. Predicated instructions can be software-pipelined like any other operations. However, if there is a large amount of data dependent control flow within the loop body.
If a machine does not have predicated instructions, we can use the concept of hierarchical reduction, described below, to handle a small amount of data dependent control flow. Like Algorithm 10.11, in hierarchical reduction the control constructs in the loop are scheduled inside-out, starting with the most deeply nested structures. As each construct is scheduled, the entire construct is reduced to a single node representing all the scheduling constraints of its components with respect to the other parts of the program. This node can then be scheduled as if it were a simple node within the surrounding control construct.
The scheduling process is complete when the entire program is reduced to a single node.
In the case of a conditional statement with "then" and "else" branches, we schedule each of the branches independently. Then:
1. The constraints of the entire conditional statement are conservatively taken to be the union of the constraints from both branches.
2. Its resource usage is the maximum of the resources used in each branch.
3. Its precedence constraints are the union of those in each branch, obtained by pretending that both branches are executed.
This node can then be scheduled like any other node. Two sets of code, corresponding tp the two branches, are generated. Any code scheduled in parallel with the conditional statement is duplicated in both branches. If multiple conditional statements are overlapped, separate code must be generated for each combination of branches executed in parallel.
Hardware Support for Software Pipelining: Specialized hardware support has been proposed for minimizing the size ofsoftware-pipelined code. The rotating register file in the Itanium architecture isone such example. A rotating register file has a base register, which is added tothe register number specified in the code to derive the actual register accessed.We can get different iterations in a loop to use different registers simply by changing the contents of the base register at the boundary of each iteration. The Itanium architecture also has extensive predicated instruction support. Notonly can predication be used to convert control dependence to data dependencebut it also can be used to avoid generating the prologs and epilogs. The bodyof a software-pipelined loop contains a superset of the instructions issued in theprolog and epilog. We can simply generate the code for the steady state anduse predication appropriately to suppress the extra operations to get the effectsof having a prolog and an epilog.
While Itanium's hardware support improves the density of software-pipelined code, we must also realize that the support is not cheap. Since software pipelining is a technique intended for tight innermost loops, pipelined loops tend to be small anyway. Specialized support for software pipelining is warranted principally for machines that are intended to execute many software-pipelined loops and in situations where it is very important to minimize code size.