# Bottom-up Parsing

Bottom-up parsers work by gradually reducing the input program to the start symbol $S$.

## LR(0) Parsing

### Handles

A handle is a string that can be reduced (though a sequence of steps) to the start symbol.

The intuition behind choosing between shift and reduce is that we only want to reduce if the result can be reduced to the start symbol eventually.

For every rightmost derivation $S \rightarrow^* \alpha X \omega \rightarrow \alpha \beta \omega$, $X\rightarrow \beta$ in the position after $\alpha$ is a handle of $\alpha \beta \omega$.

#### Recognizing handles

There are no known efficient algorithms for recognizing handles. We can only use heuristics to guess whether a stack is a handle.

For some CFGs, the heuristics always guess correctly. For example, if the CFG is LL(1)1, then the heuristics always guess correctly.

#### Viable Prefixes

$\alpha$ is a viable prefix if there is an $\omega$ such that $\alpha \mid \omega$ is a legal state of a shift-reduce parser.

#### Items

An item is a production with a $\texttt{.}$ on the right-hand side, denoting a focus point.

The focus point tells us where we are in the production.

For example, the items for T $\to$ (E) are:

• T $\to$ .(E)
• T $\to$ (.E)
• T $\to$ (E.)
• T $\to$ (E).
###### Remarks
• The only item for $X \to \epsilon$ is $X \rightarrow \texttt{.}$.
• Items are also known as LR(0) items.
###### Example

Consider the input (int).

• (E|) is a state in the shift-reduce parse.
• (E is a prefix of the RHS of T $\to$ (E)
• Item T $\to$ (E.) says: so far, we’ve seen (E, and we’re hoping to see ).

#### Generalization

The stack may have many prefixes of RHS’s of productions:

$\textnormal{Prefix}_1 \ \textnormal{Prefix}_2 \ \ldots \ \textnormal{Prefix}_n$

Since $\textnormal{Prefix}_i$ is a prefix of the RHS of $X_i \to \alpha_i$, we know that $\textnormal{Prefix}_i$ will eventually be reduced to $X_i$. This means that the missing part of $\textnormal{Prefix}_{i-1}$ of the RHS of $X_{i-1} \to \alpha_{i-1}$ starts with $X_i$ (i.e. $\exists \beta$ such that $X_{i-1} \to \textnormal{Prefix}_{i-1}X_i\beta$).

Recursively, we know that $\textnormal{Prefix}_{k+1}\ \ldots\ \textnormal{Prefix}_{n}$ eventually reduces to the missing part of $\alpha_k$.

Consider the input (int * int).

Recall that our grammar is:

E -> T + E | TT -> int * T | int | (E)
• (int * | int) is a state in the shift-reduce parse.
• From top of the stack:
• int * is a prefix of the RHS of T $\to$ int * T
• $\epsilon$ is a prefix of the RHS of E $\to$ T
• ( is a prefix of the RHS of T $\to$ (E)

Our stack of items:

• T $\to$ int * .T says that we’ve seen int *, and we’re hoping to see T.
• E $\to$ .T says that we’ve seen $\epsilon$, and we’re hoping to see T.
• T $\to$ (.E) says that we’ve seen (, and we’re hoping to see E).

#### Recognizing Viable Prefixes

To recognize viable prefixes, we must recognize a sequence of partial RHS’s of productions, each of which can eventually reduce to part of the missing suffix of its predecessor.

##### An NFA for Recognizing Viable Prefixes
1. Add a new start production $S' \to S$.
2. The NFA states are the items of $G$ (including the new start production).
3. For item $E \to \alpha \texttt{.} X \beta$, add the transition2
$E \to \alpha \texttt{.} X \beta \longrightarrow^X E \to \alpha X \texttt{.}\beta$
4. For item $E \to \alpha \texttt{.} X \beta$ and production $X \to \gamma$, add the transition3
$E \to \alpha \texttt{.} X \beta \longrightarrow^{\epsilon} X \to \texttt{.} \gamma$
5. Every state is an accepting state.
6. Start state is $S' \to \texttt{.} S$.
###### Example

Recall our running example grammar:

E -> T + E | TT -> int * T | int | (E)
1. Create $S' \to \texttt{.} E$.
2. Add transition $S' \to \texttt{.} E \longrightarrow^E S' \to E \texttt{.}$
3. Add transitions 4
$S' \to \texttt{.} E \longrightarrow^{\epsilon} E \to \texttt{.} T$
$S' \to \texttt{.} E \longrightarrow^{\epsilon} E \to \texttt{.} T + E$
4. For the state $E \to \texttt{.} T$, add transitions
$\longrightarrow^T T \to \texttt{.int * } T$
$\longrightarrow^T T \to \texttt{.int}$
$\longrightarrow^T T \to \texttt{.}(E)$
5. Repeat the process until we have all the transitions to get the NFA.

Once we have the NFA, we can convert it to a DFA as we have seen before by applying the epsilon closures and the subset construction.

###### Remarks:
• We call the states of the DFA “canonical collections of LR(0) items”.
• The dragon book also shows another way of constructing the LR(0) items.

#### Valid Items

We say that an item $X \to \beta \texttt{.} \gamma$ is a valid item for a viable prefix $\alpha \beta$ if

$S' \to^* \alpha X \omega \to \alpha \beta \gamma \omega$

by a right-most derivation.

The $\gamma$’s of the valid items tell us what we hope to see next from the input.

An item is often valid for many prefixes. For example, $T \to (\texttt{.}E)$ is valid for prefixes with any number of starting parentheses.

### LR(0) - When to Shift / Reduce?

Let’s say:

• We are at a state $\alpha \mid t$ in the shift-reduce parse, i.e. the stack contains $\alpha$ and the next input starts with $t$.
• DFA on input $\alpha$ leads us to the state $s$.

The idea is to:

• Reduce by $X \to \beta$ if $s$ contains $X \to \beta \texttt{.}$.
• Shift if $s$ contains $X \to \beta \texttt{.} t \omega$ (i.e. $s$ has a transition on $t$).

#### LR(0) Conflicts

• Shift-reduce conflict: $s$ contains $X \to \beta \texttt{.} t \omega$ and $X \to \beta$.
• Reduce-reduce conflict: $s$ contains $X \to \beta$ and $Y \to \gamma$.

## SLR Parsing

SLR stands for “Simple Left-to-Right”. It improves on LR(0) with one change (shown below in bold):

• Reduce by $X \to \beta$ if $s$ contains $X \to \beta \texttt{.}$ and ${\bf t \in \textbf{Follow}(X)}$
• Shift if $s$ contains $X \to \beta \texttt{.} t \omega$ (i.e. $s$ has a transition on $t$).

This ensures we don’t do the reduction if we know we’d get after the reduction, which is $Xt$, cannot be further reduced.

If there are still conflicts under these rules, we say the grammar is not SLR.

###### Remark
• Even though the SLR algorithm does a sort of “one-step lookahead”, we still call it an LR(0) algorithm because the DFA states still only contain LR(0) items.
• Many grammars are not SLR.
• In practice, we can use precedence declarations and rewrite our grammar into a form that does not have conflicts (i.e. is SLR).

### SLR Parsing Algorithm

1. Let $M$ be the DFA for viable prefixes of $G$.
2. Let $\mid x_1 \ldots x_n$ be the initial configuration.
3. Repeat until configuration is $S \mid$:
• Let $\alpha \mid w$ be the current configuration.
• Run $M$ on current stack $\alpha$.
• If $M$ rejects $\alpha$, report parsing error.
• (Because this means $\alpha$ is not a viable prefix.)
• Else let $t$ be the next input symbol.
• Use SLR heuristics to decide whether to shift or reduce.

Note: We can improve the performance by caching the results of the DFA runs.

#### Action Table

The SLR parsing algorithm uses an action table to decide whether to shift or reduce.

The action table is a 2D table with rows for states and columns for input symbols.

For each DFA state $s$ and input symbol $t$, we have an entry in the table action[s, t] that is either:

• shift s' (where $s'$ is the state $s$ leads to on input $t$)
• reduce by $X \to \alpha$
• accept (if $s$ is the accepting state and $t$ is the end-of-input marker)
• error otherwise

## LR(1) Parsing

The LR(1) parsing algorithm is similar to the SLR parsing algorithm, but its DFA states contain LR(1) items instead of LR(0) items.

The idea is to build lookahead into the items. An LR(1) item is a pair of an LR(0) item and a lookahead symbol.

For example, [T -> . int * T, $] means: after seeing T -> int * T, reduce by T -> int * T if the next input is $.

The LR(1) algorithm is not covered in detail in this class.

# Remarks

Bottom-up parsers are more efficient than top-down parsers, but top-down parsers are easier to implement. This is why the latter are becoming more common, especially for big languages like C++. Tools like Bison and Flex that generate bottom-up parsers are still useful for small languages, like the one we are using in this class.

## Footnotes

1. A CFG is LL(1) if for every nonterminal $X$ and every terminal $a$, there is at most one production $X \rightarrow \alpha$ such that $a \in \operatorname{FIRST}(\alpha)$.

2. The transition takes care of the situation where we have seen $\alpha$ and we’re waiting to see $X$.

3. This takes care of the same situation where we are hoping to see $X$, but it’s also fine to get parts of a production that will eventually reduce to $X$.

4. These two transitions take care of the situation from the start state where it’s fine to not directly get a $E$, but get things that can reduce to $E$.