## Nullable and FIRST

**Introduction:** In simple cases, like the above, all productions for a non-terminal start with distinct terminals except at most one production that does not start with a terminal. We chose the latter whenever the input symbol did not match any of the terminal symbols starting the other productions. We can extend the method to work also for grammars where several productions start with non-terminals. We just need to be able to select between them based on the input symbol. In other words, if the strings these productions can derive begin with symbols from disjoint sets, we check if the input symbol is in one of these sets and choose the corresponding production if it is. If not, and there is an empty production, we choose this. Otherwise, we report a syntax error message.

Hence, we define the function FIRST, which given a sequence of grammar symbols (e.g, the right-hand side of a production) returns the set of symbols with which strings derived from that sequence can begin: Definition 3.2 A symbol c is in FIRST(a) if and only if a)cb for some (possibly empty) sequence b of grammar symbols.

To calculate FIRST, we need an auxiliary function Null able, which for a sequence a of grammar symbols indicates whether or not that sequence can derive the empty string:

**Definition: **A sequence a of grammar symbols is Nullable (we write this as Nullable (a)) if and only if a)e.

A production N -> a is called nullable if Nullable(a). We describe calculation of Nullable by case analysis over the possible forms of sequences of grammar symbols

Algorithm 3.4

Nullable(ε) = true

Nullable(a) = false

Nullable(αβ) = Nullable(α)^Nullable(β)

Nullable(N) = Nullable(α1)∨: : :∨Nullable(αn);

where the productions for N are

N -> α 1; : : : ; N -> α n

where a is a terminal, N is a non-terminal, α and β are sequences of grammar symbols and e represents the empty sequence of grammar symbols.

The equations are quite natural: Any occurrence of a terminal on a right-hand side makes Nullable false for that right-hand side, but a non-terminal is nullable if any production has a nullable right-hand side.

Note that this is a recursive definition since Nullable for a non-terminal is defined in terms of Nullable for its right-hand sides, which may contain that same non-terminal. We can solve this in much the same way that we solved set equations in section 2.6.1. We have, however, now booleans instead of sets and several equations instead of one. Still, the method is essentially the same: We have a set of boolean equations:

X1 = F1(X1; : : : ;Xn)

...

Xn = Fn(X1; : : : ;Xn)

We initially assume X1; : : : ;Xn to be all false. We then, in any order, calculate the right-hand sides of the equations and update the variable on the left-hand side by the calculated value. We continue until all equations are satisfied. In appendix A and section 2.6.1, we required the functions to be monotonic with respect to subset. Correspondingly, we now require the boolean functions to be monotonic with respect to truth: If we make more arguments true, the result will also be more true

(i.e., it may stay unchanged, change from false to true, but never change from true to false).

If we look at grammar 3.9, we get these equations for non-terminals and right hand sides:

In a fixed-point calculation, we initially assume that Nullable is false for all non-terminals and use this as a basis for calculating Nullable for first the right-hand sides and then the non-terminals. We repeat recalculating these until there is no change between two iterations. Figure 3.14 shows the fixed-point iteration for the above equations. In each iteration, we first evaluate the formulae for the right-hand sides and then use the results of this to evaluate the non-terminals. The right-most column shows the final result.

We can calculate FIRST in a similar fashion to Nullable: