Predictive parsing revisited
Introduction:
If the right-hand sides of the productions for a non-terminal have disjoint FIRST sets, we can use the next input symbol to choose among the productions. In Predictive parsing, we picked the empty production (if any) on any symbol that was not in the FIRST sets of the non-empty productions for the same non-terminal. We can extend this, so we in case of no matching FIRST sets can select a production if it is Nullable. The idea is that a Nullable production can derive the empty string, so the input symbol need not be read by the production itself. But if there are several Nullable productions, we have no way of choosing between them. Hence, we do not allow more than one production for a non-terminal to be Nullable.
We said in section 3.3.1 that our syntax analysis methods will detect ambiguous grammars. However, this is not true with the method as stated above: We can get unique choice of production even for some ambiguous grammars, including grammar 3.4. The syntax analysis will in this case just choose one of several possible syntax trees for a given input string. In many cases, we do not consider such behavior acceptable. In fact, we would very much like our parser construction method to tell us if we by mistake write an ambiguous grammar.
Even worse, the rules for predictive parsing as presented here might even for some unambiguous grammars give deterministic choice of production, but reject strings that actually belong to the language described by the grammar. If we, for example, change the second production in grammar 3.9 to
T -> aTb
this will not change the choices made by the predictive parser for non-terminal R. However, always choosing the last production for R on a b will lead to erroneous rejection of many strings, including ab.
This kind of behavior is clearly unacceptable. We should, at least, get a warning that this might occur, so we can rewrite the grammar or choose another syntax analysis method.
Hence, we add to our construction of predictive parsers a test that will reject all ambiguous grammars and those unambiguous grammars that can cause the parser to fail erroneously.
We have so far simply chosen a nullable production if and only if no other choice is possible. This is, however, not always the right thing to do, so we must change the rule to say that we choose a production N -> a on symbol c if one of the two conditions below are satisfied:
1) c ∈ FIRST(α).
2) αis nullable and the sequence Nc can occur somewhere in a derivation starting from the start symbol of the grammar.
The first rule is obvious, but the second requires a bit of explanation: If αis nullable, we can construct a syntax tree for it without reading any input, so it seems like a nullable production could be a valid choice regardless of the next input symbol.
Not only would this give multiple valid choices of production whenever there are both nullable and non-nullable productions for the same non-terminal, but it is also not always correct to choose the nullable production: Predictive parsing makes a leftmost derivation so we always rewrite the leftmost non-terminal N in the current sequence of grammar symbols. So whatever input is not matched by N must be matched by the sequence of grammar symbols that occurs after N in the current sequence. If this is not possible, we have made a bad choice when deriving N. In particular, if N derives to the empty sequence, the next input symbol c should begin the derivation of the sequence that follows N. So at the very least, the sequence of symbols that follow N should have a derivation that begins with c. If we (using a different derivation order) derive the symbols after N before deriving N, we should, hence, see the sequence Nc during the derivation. If we don’t, we cannot rewrite N to the empty sequence without later getting stuck when rewriting the remaining sequence. Since the derivation order doesn’t change the syntax tree, we can see that if the alternative derivation order gets stuck, so will the leftmost derivation order. Hence, we can only rewrite N to the empty sequence if the next input symbol c can occur in combination with N in a legal derivation. We will in the next section see how we can determine this.
Note that a nullable production N -> αcan validly be selected if c ∈FIRST(α). Even with the restriction on choosing nullable productions, we can still have situations where both nullable and non-nullable productions are valid choices. This includes the example above with the modified grammar 3.9 (since Rb can occur in a derivation) and all ambiguous grammars that are not detected as being ambiguous by the original method where we only choose nullable productions if no other choice is possible.