Topdown Parsing
Bottomup Parsing
Bottomup parsers work by gradually reducing the input program to the start symbol $S$.
LR(0) Parsing
Handles
A handle is a string that can be reduced (though a sequence of steps) to the start symbol.
The intuition behind choosing between shift and reduce is that we only want to reduce if the result can be reduced to the start symbol eventually.
For every rightmost derivation $S \rightarrow^* \alpha X \omega \rightarrow \alpha \beta \omega$, $X\rightarrow \beta$ in the position after $\alpha$ is a handle of $\alpha \beta \omega$.
Recognizing handles
There are no known efficient algorithms for recognizing handles. We can only use heuristics to guess whether a stack is a handle.
For some CFGs, the heuristics always guess correctly. For example, if the CFG is LL(1)^{1}, then the heuristics always guess correctly.
Viable Prefixes
$\alpha$ is a viable prefix if there is an $\omega$ such that $\alpha \mid \omega$ is a legal state of a shiftreduce parser.
Items
An item is a production with a $\texttt{.}$ on the righthand side, denoting a focus point.
The focus point tells us where we are in the production.
For example, the items for T
$\to$ (E)
are:
T
$\to$.(E)
T
$\to$(.E)
T
$\to$(E.)
T
$\to$(E).
Remarks
 The only item for $X \to \epsilon$ is $X \rightarrow \texttt{.}$.
 Items are also known as LR(0) items.
Example
Consider the input (int)
.
(E)
is a state in the shiftreduce parse.(E
is a prefix of the RHS ofT
$\to$(E)
 Item
T
$\to$(E.)
says: so far, we’ve seen(E
, and we’re hoping to see)
.
Generalization
The stack may have many prefixes of RHS’s of productions:
Since $\textnormal{Prefix}_i$ is a prefix of the RHS of $X_i \to \alpha_i$, we know that $\textnormal{Prefix}_i$ will eventually be reduced to $X_i$. This means that the missing part of $\textnormal{Prefix}_{i1}$ of the RHS of $X_{i1} \to \alpha_{i1}$ starts with $X_i$ (i.e. $\exists \beta$ such that $X_{i1} \to \textnormal{Prefix}_{i1}X_i\beta$).
Recursively, we know that $\textnormal{Prefix}_{k+1}\ \ldots\ \textnormal{Prefix}_{n}$ eventually reduces to the missing part of $\alpha_k$.
Consider the input (int * int)
.
Recall that our grammar is:
E > T + E  TT > int * T  int  (E)
(int *  int)
is a state in the shiftreduce parse. From top of the stack:
int *
is a prefix of the RHS ofT
$\to$int * T
 $\epsilon$ is a prefix of the RHS of
E
$\to$T
(
is a prefix of the RHS ofT
$\to$(E)
Our stack of items:
T
$\to$int * .T
says that we’ve seenint *
, and we’re hoping to seeT
.E
$\to$.T
says that we’ve seen $\epsilon$, and we’re hoping to seeT
.T
$\to$(.E)
says that we’ve seen(
, and we’re hoping to seeE)
.
Recognizing Viable Prefixes
To recognize viable prefixes, we must recognize a sequence of partial RHS’s of productions, each of which can eventually reduce to part of the missing suffix of its predecessor.
An NFA for Recognizing Viable Prefixes
 Add a new start production $S' \to S$.
 The NFA states are the items of $G$ (including the new start production).
 For item $E \to \alpha \texttt{.} X \beta$, add the transition^{2}
$E \to \alpha \texttt{.} X \beta \longrightarrow^X E \to \alpha X \texttt{.}\beta$
 For item $E \to \alpha \texttt{.} X \beta$ and production $X \to \gamma$, add the transition^{3}
$E \to \alpha \texttt{.} X \beta \longrightarrow^{\epsilon} X \to \texttt{.} \gamma$
 Every state is an accepting state.
 Start state is $S' \to \texttt{.} S$.
Example
Recall our running example grammar:
E > T + E  TT > int * T  int  (E)
 Create $S' \to \texttt{.} E$.
 Add transition $S' \to \texttt{.} E \longrightarrow^E S' \to E \texttt{.}$
 Add transitions ^{4}
$S' \to \texttt{.} E \longrightarrow^{\epsilon} E \to \texttt{.} T$$S' \to \texttt{.} E \longrightarrow^{\epsilon} E \to \texttt{.} T + E$
 For the state $E \to \texttt{.} T$, add transitions
$\longrightarrow^T T \to \texttt{.int * } T$$\longrightarrow^T T \to \texttt{.int}$$\longrightarrow^T T \to \texttt{.}(E)$
 Repeat the process until we have all the transitions to get the NFA.
Once we have the NFA, we can convert it to a DFA as we have seen before by applying the epsilon closures and the subset construction.
Remarks:
 We call the states of the DFA “canonical collections of LR(0) items”.
 The dragon book also shows another way of constructing the LR(0) items.
Valid Items
We say that an item $X \to \beta \texttt{.} \gamma$ is a valid item for a viable prefix $\alpha \beta$ if
by a rightmost derivation.
The $\gamma$’s of the valid items tell us what we hope to see next from the input.
An item is often valid for many prefixes. For example, $T \to (\texttt{.}E)$ is valid for prefixes with any number of starting parentheses.
LR(0)  When to Shift / Reduce?
Let’s say:
 We are at a state $\alpha \mid t$ in the shiftreduce parse, i.e. the stack contains $\alpha$ and the next input starts with $t$.
 DFA on input $\alpha$ leads us to the state $s$.
The idea is to:
 Reduce by $X \to \beta$ if $s$ contains $X \to \beta \texttt{.}$.
 Shift if $s$ contains $X \to \beta \texttt{.} t \omega$ (i.e. $s$ has a transition on $t$).
LR(0) Conflicts
 Shiftreduce conflict: $s$ contains $X \to \beta \texttt{.} t \omega$ and $X \to \beta$.
 Reducereduce conflict: $s$ contains $X \to \beta$ and $Y \to \gamma$.
SLR Parsing
SLR stands for “Simple LefttoRight”. It improves on LR(0) with one change (shown below in bold):
 Reduce by $X \to \beta$ if $s$ contains $X \to \beta \texttt{.}$ and ${\bf t \in \textbf{Follow}(X)}$
 Shift if $s$ contains $X \to \beta \texttt{.} t \omega$ (i.e. $s$ has a transition on $t$).
This ensures we don’t do the reduction if we know we’d get after the reduction, which is $Xt$, cannot be further reduced.
If there are still conflicts under these rules, we say the grammar is not SLR.
Remark
 Even though the SLR algorithm does a sort of “onestep lookahead”, we still call it an LR(0) algorithm because the DFA states still only contain LR(0) items.
 Many grammars are not SLR.
 In practice, we can use precedence declarations and rewrite our grammar into a form that does not have conflicts (i.e. is SLR).
SLR Parsing Algorithm
 Let $M$ be the DFA for viable prefixes of $G$.
 Let $\mid x_1 \ldots x_n$ be the initial configuration.
 Repeat until configuration is $S \mid$:
 Let $\alpha \mid w$ be the current configuration.
 Run $M$ on current stack $\alpha$.
 If $M$ rejects $\alpha$, report parsing error.
 (Because this means $\alpha$ is not a viable prefix.)
 Else let $t$ be the next input symbol.
 Use SLR heuristics to decide whether to shift or reduce.
Note: We can improve the performance by caching the results of the DFA runs.
Action Table
The SLR parsing algorithm uses an action table to decide whether to shift or reduce.
The action table is a 2D table with rows for states and columns for input symbols.
For each DFA state $s$ and input symbol $t$, we have an entry in the table action[s, t]
that is either:
shift s'
(where $s'$ is the state $s$ leads to on input $t$)reduce
by $X \to \alpha$accept
(if $s$ is the accepting state and $t$ is the endofinput marker)error
otherwise
LR(1) Parsing
The LR(1) parsing algorithm is similar to the SLR parsing algorithm, but its DFA states contain LR(1) items instead of LR(0) items.
The idea is to build lookahead into the items. An LR(1) item is a pair of an LR(0) item and a lookahead symbol.
For example, [T > . int * T, $]
means: after seeing T > int * T
, reduce by T > int * T
if the next input is $
.
The LR(1) algorithm is not covered in detail in this class.
Remarks
Bottomup parsers are more efficient than topdown parsers, but topdown parsers are easier to implement. This is why the latter are becoming more common, especially for big languages like C++. Tools like Bison and Flex that generate bottomup parsers are still useful for small languages, like the one we are using in this class.
Footnotes

A CFG is LL(1) if for every nonterminal $X$ and every terminal $a$, there is at most one production $X \rightarrow \alpha$ such that $a \in \operatorname{FIRST}(\alpha)$. ↩

The transition takes care of the situation where we have seen $\alpha$ and we’re waiting to see $X$. ↩

This takes care of the same situation where we are hoping to see $X$, but it’s also fine to get parts of a production that will eventually reduce to $X$. ↩

These two transitions take care of the situation from the start state where it’s fine to not directly get a $E$, but get things that can reduce to $E$. ↩