Introduction: Our predictive parser uses an e-production as a default when no other production can be used. With the input of Fig. 2.18, after the terminals for and (are matched, the lookahead symbol is ;. At this point procedure optexpr is called, and the code
if ( lookahead === expr ) match (expr);
in its body is executed. Nonterminal optexpr has two productions, with bodies expr and e. The lookahead symbol ";" does not match the terminal expr, so the production with body expr cannot apply. In fact, the procedure returns without changing the lookahead symbol or doing anything else. Doing nothing corresponds to applying an e-production.
More generally, consider a variant of the productions in Fig. 2.16 where optexpr generates an expression nonterminal instead of the terminal expr:
optexpr —>• expr
Thus, optexpr either generates an expression using nonterminal expr or it generates e. While parsing optexpr, if the lookahead symbol is not in FIRST (espr), then the e-production is used.
Designing a Predictive Parser: We can generalize the technique introduced informally in Section 2.4.2, to apply to any grammar that has disjoint FIRST sets for the production bodies belonging to any nonterminal. We shall also see that when we have a translation scheme — that is, a grammar with embedded actions — it is possible to execute those actions as part of the procedures designed for the parser.
Recall that a predictive parser is a program consisting of a procedure for every nonterminal. The procedure for nonterminal A does two things.
1. It decides which A-production to use by examining the lookahead symbol. The production with body a (where a is not e, the empty string) is used if the lookahead symbol is in FiRST(a). If there is a conflict between two nonempty bodies for any lookahead symbol, then we cannot use this parsing method on this grammar. In addition, the e-production for A, if it exists, is used if the lookahead symbol is not in the FIRST set for any other production body for A.
2. The procedure then mimics the body of the chosen production. That is, the symbols of the body are "executed" in turn, from the left. A nonterminal is "executed" by a call to the procedure for that nonterminal, and a terminal matching the lookahead symbol is "executed" by reading the next input symbol. If at some point the terminal in the body does not match the lookahead symbol, a syntax error is reported.
Just as a translation scheme is formed by extending a grammar, a syntax directed translator can be formed by extending a predictive parser. An algorithm for this purpose is given in Section 5.4. The following limited construction suffices for the present:
1. Construct a predictive parser, ignoring the actions in productions.
2. Copy the actions from the translation scheme into the parser. If an action appears after grammar symbol X in production p, then it is copied after the implementation of X in the code for p. Otherwise, if it appears at the beginning of the production, then it is copied just before the code for the production body.
Left Recursion: It is possible for a recursive-descent parser to loop forever. A problem arises with "left-recursive" productions like
expr —y expr term
where the leftmost symbol of the body is the same as the nonterminal at the head of the production. Suppose the procedure for expr decides to apply this production. The body begins with expr so the procedure for expr is called recursively. Since the lookahead symbol changes only when a terminal in the body is matched, no change to the input took place between recursive calls of expr. As a result, the second call to expr does exactly what the first call did, which means a third call to expr, and so on, forever. A left-recursive production can be eliminated by rewriting the offending production. Consider a nonterminal A with two productions
A ->• Aa | /3
where a and /3 are sequences of terminals and non-terminals that do not start with A. For example, in
expr —y expr term \ term
nonterminal A — expr, string a = term, and string (3 = term
The nonterminal A and its production are said to be left recursive, because the production A -> Aa has A itself as the leftmost symbol on the right side. Repeated application of this production builds up a sequence of a ' s to the right of A, as in Fig. 2.20(a). When A is finally replaced by /3, we have a /3 followed by a sequence of zero or more a's.