Operator precedence
Introduction:
We can describe traditional arithmetic expressions by grammar. Num is a terminal that denotes all integer constants and that, here, the parentheses are terminal symbols (unlike in regular expressions, where they are used to impose structure on the regular expressions). This grammar is ambiguous, as evidenced by, e.g., the production
Exp -> Exp Exp
which has the form that in section 3.3.1 was claimed to imply ambiguity. This ambiguity is not surprising, as we are used to the fact that an expression like 2 3*4 can be read in two ways: Either as multiplying the sum of 2 and 3 by 4 or as adding 2 to the product of 3 and 4. Simple electronic calculators will choose the first of these interpretations (as they always calculate from left to right), whereas scientific calculators and most programming languages will choose the second, as they use a hierarchy of operator precedences which dictate that the product must be calculated before the sum. The hierarchy can be overridden by explicit parenthesisation, e.g.,
(2 3)*4.
Most programming languages use the same convention as scientific calculators, so we want to make this explicit in the grammar. Ideally, we would like the expression 2 3*4 to generate the syntax tree shown in figure 3.10, which reflects the operator precedence’s by grouping of sub expressions: When evaluating an expression, the sub expressions represented by subtrees of the syntax tree are evaluated before the topmost operator is applied.
A possible way of resolving the ambiguity is to use precedence rules during syntax analysis to select among the possible syntax trees. Many parser generators allow this approach, as we shall see in section 3.16. However, some parsing methods require the grammars to be unambiguous, so we have to express the operator hierarchy in the grammar itself. To clarify this, we first define some concepts:
- An operator ⊕ is left-associative if the expression a⊕b⊕c must be evaluated from left to right, i.e., as (a⊕b)⊕c.
- An operator ⊕ is right-associative if the expression a⊕b⊕c must be evaluated from right to left, i.e., as a⊕ (b⊕c).
- An operator ⊕ is non-associative if expressions of the form a⊕b⊕c are illegal.
By the usual convention, - and / are left-associative, as e.g., 2-3-4 is calculated as (2-3)-4. and * are associative in the mathematical sense, meaning that it does not matter if we calculate from left to right or from right to left. However, to avoid ambiguity we have to choose one of these. By convention (and similarity to – and /) we choose to let these be left-associative as well. Also, having a left-associative - and right-associative would not help resolving the ambiguity of 2-3 4, as the operators so-to-speak “pull in different directions”.
List construction operators in functional languages, e.g., :: and @ in SML, are typically right-associative, as are function arrows in types: a -> b -> c is read as a -> (b -> c). The assignment operator in C is also right-associative: a=b=c is read as a= (b=c).
In some languages (like Pascal), comparison operators (like < or >) are non associative, i.e., you are not allowed to write 2 < 3 < 4.