## FOLLOW

**Introduction:**

To determine when we can select nullable productions during predictive parsing, we introduce FOLLOW sets for non-terminals.

Definition: A terminal symbol a is in FOLLOW (N) if and only if there is a derivation from the start symbol S of the grammar such that S -> αNaβ, where a and b are (possibly empty) sequences of grammar symbols.

In other words, a terminal c is in FOLLOW (N) if c may follow N at some point in a derivation. Unlike FIRST(N), this is not a property of the productions for N, but of the productions that (directly or indirectly) use N on their right-hand side. To correctly handle end-of-string conditions, we want to detect if S-> αN, i.e., if there are derivations where N can be followed by the end of input. It turns out to be easy to do this by adding an extra production to the grammar:

**S****0 ****-> ****S****$**

where S0 is a new non-terminal that replaces S as start symbol and $ is a new terminal symbol representing the end of input. Hence, in the new grammar, $ will be in FOLLOW (N) exactly if S’ -> αN$ which is the case exactly when S) αN. The easiest way to calculate FOLLOW is to generate a collection of set constraints, which are subsequently solved to find the least sets that obey the constraints. A production

**M -> α****Nβ**

generates the constraint FIRST(β) ⊆FOLLOW(N), since β, obviously, can follow N. Furthermore, if Nullable (β) the production also generates the constraint FOLLOW(M) ⊆FOLLOW(N) (note the direction of the inclusion). The reason is that, if a symbol c is in FOLLOW(M), then there (by definition) is a derivation S’ -> γMcδ. But since M -> αNβ and b is nullable, we can continue this by γMcδ-> γαNcδ, so c is also in FOLLOW (N).

If a right-hand side contains several occurrences of non-terminals, we add constraints for all occurrences, i.e., splitting the right-hand side into different as, Ns and bs. For example, the production A -> BcB generates the constraint {c} ⊆FOLLOW(B) by splitting after the first B and, by splitting after the last B, we also get the constraint FOLLOW(A) ⊆ FOLLOW(B). We solve the constraints in the following fashion:

We start by assuming empty FOLLOW sets for all non-terminals. We then handle the constraints of the form FIRST (b) ⊆FOLLOW (N): We compute FIRST (β) and add this to FOLLOW (N). Then, we handle the second type of constraints: For each constraint FOLLOW (M) ⊆FOLLOW (N), we add all elements of FOLLOW (M) to FOLLOW (N). We iterate these last steps until no further changes happen.

**The steps taken to calculate the follow sets of a grammar are, hence:**

1. Add a new non-terminal S’ -> S$, where S is the start symbol for the original grammar. S0 is the start symbol for the extended grammar.

2. For each non-terminal N, locate all occurrences of N on the right-hand sides of productions. For each occurrence do the following:

a. Let β be the rest of the right-hand side after the occurrence of N. Note that b may be empty. In other words, the production is of the form M -> αNβ, where M is a non-terminal (possibly equal to N) and αand β are (possibly empty) sequences of grammar symbols. Note that if a right-hand-side contains several occurrences of N, we make a split for each occurrence.

b. Let m = FIRST(β). Add the constraint m ⊆FOLLOW(N) to the set of constraints. If b is empty, you can omit this constraint, as it does not add anything.

c. If Nullable(β), find the non-terminal M at the left-hand side of the production and add the constraint FOLLOW(M) ⊆FOLLOW(N). If M = N, you can omit the constraint, as it does not add anything. Note that if b is empty, Nullable(β) is true.

3. Solve the constraints using the following steps:

a. Start with empty sets for FOLLOW(N) for all non-terminals N (not including S’).

b. For each constraint of the form m ⊆FOLLOW (N) constructed in step 2.1, add the contents of m to FOLLOW (N).

c. Iterating until a fixed-point is reached, for each constraint of the form FOLLOW (M) ⊆FOLLOW (N), add the contents of FOLLOW (M) to FOLLOW (N).