Introduction to Machine-Independent Optimizations
Introduction: When code is modified to make it more efficient on a particular machine, a machine-dependent optimization is being performed. Machine-dependent optimization uses information about the limits and special features of the target machine to produce code which is shorter or which executes more quickly on the machine. Clearly, in a general topic on compiling, we cannot go into the details of machine-dependent optimization for specific machines. Instead, we review the most important techniques of machine-dependent optimization, and present a variety of examples illustrating the application of these techniques.
High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. Elimination of unnecessary instructions in object code, or the replacement of one sequence of instructions by a faster sequence of instructions that does the same thing is usually called "code improvement" or "code optimization." Most global optimizations are based on data-flow analyses, which are algorithms to gather information about a program. The results of data-flow analyses all have the same form: for each instruction in the program, they specify some property that must hold every time that instruction is executed. The analyses differ in the properties they compute. For example, a constant-propagation analysis computes, for each point in the program, and for each variable used by the program, whether that variable has a unique constant value at that point.
This information may be used to replace variable references by constant values, for instance. As another example, a liveness analysis determines, for each point in the program, whether the value held by a particular variable at that point is sure to be overwritten before it is read. If so, we do not need to preserve that value, either in a register or in a memory location. We can use essentially the same algorithms for all these instances of data-flow analysis, and we can measure the performance of these algorithms and show their correctness on all instances, as well. Section 9.4 is an example of the general framework that does more powerful analysis than the earlier examples. Then, in Section 9.5 we consider a powerful technique, called "partial redundancy elimination," for optimizing the placement of each expression evaluation in the program. The solution to this problem requires the solution of a variety of different data-flow problems.
The code produced by the compiler should take advantage of the special features of the target machine. For example, consider code intended for machines of the PDP-11 family. These computers have autoincrement and autodecrement modes for instructions. When an instruction is given in the autoincrement mode, the contents of the register are incremented after being used. The register is incremented by one for byte instructions and by two for word instructions. The use of instructions in these modes reduces the code necessary for pushing and popping stacks. The PDP-11 computers also have machine-level instructions to increment (INC), or to decrement (DEC), by one, values stored in memory. Whenever possible, the INC and DEC operations should be used instead of creating a constant with value 1 and adding or subtracting this constant from the value stored in memory.
- The PDP-11 machines have left- and right-shift operations. Shifting the bits one position to the left is equivalent to multiplying by 2. Since shifting is faster than multiplication or division, more efficient code is generated if multiplication and division by multiples of 2 are implemented with shift operations.
- If two instructions do not depend on one another, they may be executed in parallel. Consider an expression such as A* B Y* Z. The two factors A * B and Y * Z can be evaluated at the same time. If the target machine allows parallel processing of two multiplications, the compiler should generate code to take advantage of this capability. Also, compilers generating code for pipelined vector machines should take advantage of these machines’ ability to perform vector operations in parallel.
- When different instructions are available to perform a task, the most efficient should be chosen. For instance, on the IBM 370 and 30XX computers, a number of move instructions exist. When more than 256 characters are to be moved, the MVCL (move character long) instruction should be used; otherwise, the MVC (move character) instruction should be employed. The compiler should always make the best use of the instruction set. Some machines have immediate instructions. Immediate instructions are those instructions which have a value included as part of the instruction. For example, the IBM 370, the MVI (move immediate) instruction can be used when only a single byte needs to be moved. Similarly the PDP-11 immediate mode allows fast access of constant operands by placing the desired constant in the word of memory just after the instruction.
- Often appreciably better code is generated if the addressing features provided by the computer are considered. The indexing and indirect-addressing capabilities of the machine should be used effectively. For an indexed instruction, the contents of a specified register, called the index register, are added to the given address to obtain the address of the operand.
- Indirect addressing is provided by many machines to shorten access time. When indirect addressing is used, the address of the true operand is given as the operand of the instruction. The distinction between direct and indirect addressing is illustrated in the following example. Consider a PDP-11 machine with register 0 (r0) containing the value 15, register 1 (rl) containing 2575, and memory location 2575 containing 1. An ADD instruction without indirect addressing is as follows: