Writing ambiguous expression grammars
Introduction: If we have an ambiguous grammar
E -> E ⊕ E
E -> num
we can rewrite this to an unambiguous grammar that generates the correct structure. As this depends on the associativity of ⊕, we use different rewrite rules for different associativities.
If ⊕ is left-associative, we make the grammar left-recursive by having a recursive reference to the left only of the operator symbol:
E -> E ⊕ E’
E -> E’
E’ -> num
Now, the expression 2⊕3⊕4 can only be parsed as
We get a slightly more complex syntax tree than in figure 3.10, but not enormously so.
We handle right-associativity in a similar fashion: We make the offending production right-recursive:
E -> E’ ⊕ E
E -> E’
E’ -> num
Non-associative operators are handled by non-recursive productions:
E -> E’ ⊕ E’
E -> E’
E’ -> num
Note that the latter transformation actually changes the language that the grammar generates, as it makes expressions of the form num⊕ num ⊕num illegal. So far, we have handled only cases where an operator interacts with itself. This is easily extended to the case where several operators with the same precedence and associativity interact with each other, as for example and -:
E -> E E’
E -> E –E’
E -> E’
E’ -> num
Operators with the same precedence must have the same associativity for this to work, as mixing left-recursive and right-recursive productions for the same non-terminal makes the grammar ambiguous. As an example, the grammar
- E -> E E’
- E -> E’ ⊕ E
- E -> E’
- E’ -> num
- Exp -> Exp Exp2
- Exp -> Exp-Exp2
- Exp -> Exp2
- Exp2 -> Exp2*Exp3
- Exp2 -> Exp2/Exp3
- Exp2 -> Exp3
- Exp3 -> num
- Exp3 -> (Exp)
Grammar 3.11: Unambiguous expression grammar
seems like an obvious generalization of the principles used above, giving and ⊕ the same precedence and different associativity. But not only is the grammar ambiguous, it does not even accept the intended language. For example, the string num num ⊕ num is not derivable by this grammar.
In general, there is no obvious way to resolve ambiguity in an expression like 1 2⊕3, where is left-associative and ⊕ is right-associative (or vice-versa). Hence, most programming languages (and most parser generators) require operators at the same precedence level to have identical associativity.
We also need to handle operators with different precedences. This is done by using a non-terminal for each precedence level. The idea is that if an expression uses an operator of a certain precedence level, then its sub-expressions cannot use operators of lower precedence (unless these are inside parentheses).