Methods for rewriting grammars for LL(1) parsing
Introduction: Now we will look at methods for rewriting grammars such that they are more palatable for LL(1) parsing. In particular, we will look at elimination of left recursion and at left factorisation. It must, however, be noted that not all grammars can be rewritten to allow LL(1) parsing. In these cases stronger parsing techniques must be used.
Eliminating left-recursion The unambiguous expression grammar) is notLL(1). The reason is that all productions in Exp and Exp2 have the same FIRSTsets. Overlap like this will always happen when there are left-recursive productionsin the grammar, as the FIRST set of a left-recursive production will include theFIRST set of the nonterminal itself and hence be a superset of the FIRST sets of allthe other productions for that nonterminal. To solve this problem, we must avoidleft-recursion in the grammar. We start by looking at direct left-recursion.
When we have a nonterminal with some left-recursive and some productions that are not, i.e.,
- N -> N α 1
- ...
- N -> N α m
- N -> β1
- ...
- N -> βn
where the βi do not start with N, we observe that this generates all sequences that start with one of the βi and continues with any number (inluding 0) of the α j. In other words, the grammar is equivalent to the regular expression (β1 | : : : |βn)( α1| : : : | αm)*.
We see in this figure a method for converting regular expressions into context free grammars can generate the same set of strings. By following this procedure and simplifying a bit afterwards, we get this equivalent grammar:
Exp -> Exp2 Exp*
Exp* -> Exp2 Exp*
Exp* -> - Exp2 Exp*
Exp* ->
Exp2 -> Exp3 Exp2*
Exp2* -> * Exp3 Exp2*
Exp2* -> / Exp3 Exp2*
Exp2* ->
Exp3 -> num
Exp3 -> ( Exp )
- N -> b1N*
- ...
- N -> bnN*
- N* -> a1N*
- ...
- N* -> amN*
- N* ->
where N* is a new nonterminal that generates a sequence of αs. Note that, since the βi do not start with N, there is no direct left-recursion in the first n productions. Since N* is a new nonterminal, the αj cannot start with this, so the last m productions can’t have direct left-recursion either. There may, however, still be indirect left-recursion if any of the αj are nullable or the bi can derive something starting with N. We will briefly look at indirect left-recursion below.
While we have eliminated direct left-recursion, we have also changed the syntax trees that are built from the strings that are parsed. Hence, after parsing, the syntax tree must be re-structured to obtain the structure that the original grammar describes.