Syntax trees and ambiguity
Introduction: We can draw a derivation as a tree: The root of the tree is the start symbol of the grammar, and whenever we rewrite a non-terminal we add as its children the symbols on the right-hand side of the production that was used. The leaves of the tree are terminals which, when read from left to right, form the derived string. If a non-terminal is rewritten using an empty production, an e is shown as its child. This is also a leaf node, but is ignored when reading the string from the leaves of the tree.
When we write such a syntax tree, the order of derivation is irrelevant: We get the same tree for left derivation, right derivation or any other derivation order. Only the choice of production for rewriting each non-terminal matters.
As an example, the derivations in figures 3.5 and 3.6 yield the same syntax tree, which is shown in figure 3.7.
The syntax tree adds structure to the string that it derives. It is this structure that we exploit in the later phases of the compiler. For compilation, we do the derivation backwards: We start with a string and want to produce a syntax tree. This process is called syntax analysis or parsing.
Even though the order of derivation does not matter when constructing a syntax tree, the choice of production for that non-terminal does. Obviously, different choices can lead to different strings being derived, but it may also happen that several different syntax trees can be built for the same string. As an example, figure 3.8 shows an alternative syntax tree for the same string that was derived in figure 3.7.
When a grammar permits several different syntax trees for some strings we call the grammar ambiguous. If our only use of grammar is to describe sets of strings, ambiguity is not a problem. However, when we want to use the grammar to impose structure on strings, the structure had better be the same every time. Hence, it is a desirable feature for a grammar to be unambiguous. In most (but not all) cases, an ambiguous grammar can be rewritten to an unambiguous grammar that generates the same set of strings, or external rules can be applied to decide which of the many possible syntax trees is the “right one”. An unambiguous version of grammar 3.4 is shown in figure 3.9.
T -> R
T -> aTc
R ->
R -> bR
Grammar 3.9: Unambiguous version of grammar 3.4
How do we know if a grammar is ambiguous? If we can find a string and show two alternative syntax trees for it, this is a proof of ambiguity. It may, however, be hard to find such a string and, when the grammar is unambiguous, even harder to show that this is the case. In fact, the problem is formally un-decidable, i.e., there is no method that for all grammars can answer the question “Is this grammar ambiguous?”. But in many cases it is not difficult to detect and prove ambiguity. For example, any grammar that has a production of the form
N -> NαN , where a is any sequence of grammar symbols, is ambiguous.
We will, in sections 3.12 and 3.14, see methods for constructing parsers from grammars. These methods have the property that they only work on unambiguous grammars, so successful construction of a parser is a proof of un-ambiguity. However, the methods may also fail on certain unambiguous grammars, so they cannot be used to prove ambiguity.
In the next section, we will see ways of rewriting a grammar to get rid of some sources of ambiguity. These transformations preserve the language that the grammar generates. By using such transformations (and others, which we will see later), we can create a large set of equivalent grammars, i.e., grammars that generate the same language (though they may impose different structures on the strings of the language).
Given two grammars, it would be nice to be able to tell if they are equivalent. Unfortunately, no known method is able to decide this in all cases, but, unlike ambiguity, it is not (at the time of writing) known if such a method may or may not theoretically exist. Sometimes, equivalence can be proven e.g. by induction over the set of strings that the grammars produce. The converse can be proven by finding an example of a string that one grammar can generate but the other not. But in some cases, we just have to take claims of equivalence on faith or give up on deciding the issue.