Pattern Matching by Parsing
Introduction: Consider a specialized approach that uses an LR parser to do the pattern matching. The input tree can be treated as a string by using its prefix representation. For example, the prefix representation for the tree in Fig. 8.19 is
= ind Ca i ? S P ind d R$P Mb d
The tree-translation scheme can be converted into a syntax-directed translation scheme by replacing the tree-rewriting rules with the productions of a context-free grammar in which the right sides are prefix representations of the instruction templates.
Example: The syntax-directed translation scheme in Fig. 8.21 is based on the tree-translation scheme.
The nonterminals of the underlying grammar are R and M. The terminal m represents a specific memory location, such as the location for the global variable b in Example 8.18. The production M m in Rule (10) can be thought of as matching M with m prior to using one of the templates involving M. Similarly, we introduce a terminal sp for register SP and add the production R SP. Finally, terminal c represents constants. Using these terminals, the string for the input tree in Fig. 8.19 is
= ind ca sp ind c$ sp m& ci
From the productions of the translation scheme we build an LR parser using one of the LR-parser construction techniques of Chapter 4. The target code is generated by emitting the machine instruction corresponding to each reduction.
A code-generation grammar is usually highly ambiguous, and some care needs to be given to how the parsing-action conflicts are resolved when the parser is constructed. In the absence of cost information, a general rule is to favor larger reductions over smaller ones. This means that in a reduce-reduce conflict, the longer reduction is favored; in a shift-reduce conflict, the shift move is chosen. This "maximal munch" approach causes a larger number of operations to be performed with a single machine instruction.
There are some benefits to using LR parsing in code generation. First, the parsing method is efficient and well understood, so reliable and efficient code generators can be produced using the algorithms described in Chapter 4. Second, it is relatively easy to retarget the resulting code generator; a code selector for a new machine can be constructed by writing a grammar to describe the instructions of the new machine. Third, the quality of the code generated can be made efficient by adding special-case productions to take advantage of machine idioms.
However, there are some challenges as well. A left-to-right order of evaluation is fixed by the parsing method. Also, for some machines with large numbers of addressing modes, the machine-description grammar and resulting parser can become inordinately large. As a consequence, specialized techniques are necessary to encode and process the machine-description grammars. We must also be careful that the resulting parser does not block (has no next move) while parsing an expression tree, either because the grammar does not handle some operator patterns or because the parser has made the wrong resolution of some parsing-action conflict. We must also make sure the parser does not get into an infinite loop of reductions of productions with single symbols on the right side. The looping problem can be solved using a state-splitting technique at the time the parser tables are generated.
Routines for Semantic Checking: In a code-generation translation scheme, the same attributes appear as in aninput tree, but often with restrictions on what values the subscripts can have.For example, a machine instruction may require that an attribute value fall ina certain range or that the values of two attributes be related.These restrictions on attribute values can be specified as predicates that areinvoked before a reduction is made. In fact, the general use of semantic actionsand predicates can provide greater flexibility and ease of description than apurely grammatical specification of a code generator. Generic templates canbe used to represent classes of instructions and the semantic actions can thenbe used to pick instructions for specific cases. For example, two forms of theaddition instruction can be represented with one template:
Parsing-action conflicts can be resolved by disambiguating predicates that can allow different selection strategies to be used in different contexts. A smaller description of a target machine is possible because certain aspects of the machine architecture, such as addressing modes, can be factored into the attributes. The complication in this approach is that it may become difficult to verify the accuracy of the translation scheme as a faithful description of the target machine, although this problem is shared to some degree by all code generators.