Writing context free grammars
Introduction: As hinted above, a regular expression can systematically be rewritten as a context free grammar by using a non-terminal for every sub-expression in the regular expression and using one or two productions for each non-terminal.
The construction is shown in figure 3.1. So, if we can think of a way of expressing a language as a regular expression, it is easy to make a grammar for it. However, we will also want to use grammars to describe non-regular languages. An example is the kind of arithmetic expressions that are part of most programming languages (and also found on electronic calculators). Such expressions can be described by grammar 3.2. Note that, as mentioned in section 2.10, the matching parentheses cannot be described by regular expressions, as these cannot “count” the number of unmatched opening parentheses at a particular point in the string. However, if we did not have parentheses in the language, it could be described by the regular expression
Num (( |-|*j/) num)*
Even so, the regular description is not useful if you want operators to have different precedence, as it treats the expression as a flat string rather than as having structure. We will look at structure in sections 3.3.1 and 3.4.
Most constructions from programming languages are easily expressed by context free grammars. In fact, most modern languages are designed this way.
Exp -> Exp Exp
Exp -> Exp-Exp
Exp -> Exp*Exp
Exp -> Exp/Exp
Exp -> num
Exp -> (Exp)
Grammar 3.2: Simple expression grammar
Stat -> id :=Exp
Stat -> Stat ; Stat
Stat -> if Exp then Stat else Stat
Stat -> if Exp then Stat
Grammar 3.3: Simple statement grammar
When writing a grammar for a programming language, one normally starts by dividing the constructs of the language into different syntactic categories. A syntactic category is a sub-language that embodies a particular concept. Examples of common syntactic categories in programming languages are:
Expressions are used to express calculation of values.
Statements express actions that occur in a particular sequence.
Declarations express properties of names used in other parts of the program.
Each syntactic category is denoted by a main non-terminal, e.g., Exp from grammar 3.2. More non-terminals might be needed to describe a syntactic category or provide structure to it, as we shall see, and productions for one syntactic category can refer to non-terminals for other syntactic categories. For example, statements may contain expressions, so some of the productions for statements use the main non-terminal for expressions. A simple grammar for statements might look like grammar 3.3, which refers to the Exp non-terminal from grammar 3.2.