## Derivation

**Introduction: **So far, we have just appealed to intuitive notions of recursion when we describe the set of strings that a grammar produces. Since the productions are similar to recursive set equations, we might expect to use the techniques from section 2.6.1 to find the set of strings denoted by a grammar. However, though these methods in theory apply to infinite sets by considering limits of chains of sets, they are only practically useful when the sets are finite. Instead, we below introduce the concept of derivation. An added advantage of this approach is, as we will later see, that syntax analysis is closely related to derivation.

The basic idea of derivation is to consider productions as rewrite rules: Whenever we have a non-terminal, we can replace this by the right-hand side of any production in which the non-terminal appears on the left-hand side. We can do this anywhere in a sequence of symbols (terminals and non-terminals) and repeat doing so until we have only terminals left. The resulting sequence of terminals is a string in the language defined by the grammar. Formally, we define the derivation relation) by the three rules

- α N β -> α -> β if there is a production N -> γ
- α -> α
- α -> γ if there is α β such that α ->β and β -> γ

where a; b and g are (possibly empty) sequences of grammar symbols (terminals and non-terminals). The first rule states that using a production as a rewrite rule (anywhere in a sequence of grammar symbols) is a derivation step. The second states that the derivation relation is reflexive, i.e., that a sequence derives itself. The third rule describes transitivity, i.e., that a sequence of derivations is in itself a derivation2. We can use derivation to formally define the language that a context-free grammar generates.

Definition: Given a context-free grammar G with start symbol S, terminal symbols T and productions P, the language L(G) that G generates is defined to be the set of strings of terminal symbols that can be obtained by derivation from S using the productions P, i.e., the set {w ∈ T* |S)w}.

As an example, we see that grammar 3.4 generates the string aabbbcc by the derivation shown in figure 3.5. We have, for clarity, in each sequence of symbols underlined the non-terminal that is rewritten in the following step.

- T -> R
- T -> aTc
- R ->
- R -> RbR

Grammar 3.4: Example grammar

- T
- -> aTc
- ->aaTcc
- ->aaRcc
- ->aaRbRcc
- ->aaRbcc
- ->aaRbRbcc
- ->aaRbRbRbcc
- -> aaRbbRbcc
- ->aabbRbcc
- ->aabbbcc

**Figure 3.5: Derivation of the string aabbbcc using grammar 3.4**

- T
- -> aTc
- ->aaTcc
- -> aaRcc
- ->aaRbRcc
- ->aaRbRbRcc
- ->aabRbRcc
- ->aabRbRbRcc
- ->aabbRbRcc
- ->aabbbRcc
- ->aabbbcc

Figure 3.6: Leftmost derivation of the string aabbbcc using grammar 3.4

In this derivation, we have applied derivation steps sometimes to the leftmost non-terminal, sometimes to the rightmost and sometimes to a non-terminal that was neither. However, since derivation steps are local, the order does not matter. So, we might as well decide to always rewrite the leftmost non-terminal, as shown in figure 3.6. A derivation that always rewrites the leftmost non-terminal is called a leftmost derivation. Similarly, a derivation that always rewrites the rightmost non-terminal is called a rightmost derivation.