Top-down Parsing
Bottom-up Parsing
Bottom-up parsers work by gradually reducing the input program to the start symbol .
LR(0) Parsing
Handles
A handle is a string that can be reduced (though a sequence of steps) to the start symbol.
The intuition behind choosing between shift and reduce is that we only want to reduce if the result can be reduced to the start symbol eventually.
For every rightmost derivation , in the position after is a handle of .
Recognizing handles
There are no known efficient algorithms for recognizing handles. We can only use heuristics to guess whether a stack is a handle.
For some CFGs, the heuristics always guess correctly. For example, if the CFG is LL(1)1, then the heuristics always guess correctly.
Viable Prefixes
is a viable prefix if there is an such that is a legal state of a shift-reduce parser.
Items
An item is a production with a on the right-hand side, denoting a focus point.
The focus point tells us where we are in the production.
For example, the items for T
(E)
are:
T
.(E)
T
(.E)
T
(E.)
T
(E).
Remarks
- The only item for is .
- Items are also known as LR(0) items.
Example
Consider the input (int)
.
(E|)
is a state in the shift-reduce parse.(E
is a prefix of the RHS ofT
(E)
- Item
T
(E.)
says: so far, we’ve seen(E
, and we’re hoping to see)
.
Generalization
The stack may have many prefixes of RHS’s of productions:
Since is a prefix of the RHS of , we know that will eventually be reduced to . This means that the missing part of of the RHS of starts with (i.e. such that ).
Recursively, we know that eventually reduces to the missing part of .
Consider the input (int * int)
.
Recall that our grammar is:
E -> T + E | TT -> int * T | int | (E)
(int * | int)
is a state in the shift-reduce parse.- From top of the stack:
int *
is a prefix of the RHS ofT
int * T
- is a prefix of the RHS of
E
T
(
is a prefix of the RHS ofT
(E)
Our stack of items:
T
int * .T
says that we’ve seenint *
, and we’re hoping to seeT
.E
.T
says that we’ve seen , and we’re hoping to seeT
.T
(.E)
says that we’ve seen(
, and we’re hoping to seeE)
.
Recognizing Viable Prefixes
To recognize viable prefixes, we must recognize a sequence of partial RHS’s of productions, each of which can eventually reduce to part of the missing suffix of its predecessor.
An NFA for Recognizing Viable Prefixes
- Add a new start production .
- The NFA states are the items of (including the new start production).
- For item , add the transition2
- For item and production , add the transition3
- Every state is an accepting state.
- Start state is .
Example
Recall our running example grammar:
E -> T + E | TT -> int * T | int | (E)
- Create .
- Add transition
- Add transitions 4
- For the state , add transitions
- Repeat the process until we have all the transitions to get the NFA.
Once we have the NFA, we can convert it to a DFA as we have seen before by applying the epsilon closures and the subset construction.
Remarks:
- We call the states of the DFA “canonical collections of LR(0) items”.
- The dragon book also shows another way of constructing the LR(0) items.
Valid Items
We say that an item is a valid item for a viable prefix if
by a right-most derivation.
The ’s of the valid items tell us what we hope to see next from the input.
An item is often valid for many prefixes. For example, is valid for prefixes with any number of starting parentheses.
LR(0) - When to Shift / Reduce?
Let’s say:
- We are at a state in the shift-reduce parse, i.e. the stack contains and the next input starts with .
- DFA on input leads us to the state .
The idea is to:
- Reduce by if contains .
- Shift if contains (i.e. has a transition on ).
LR(0) Conflicts
- Shift-reduce conflict: contains and .
- Reduce-reduce conflict: contains and .
SLR Parsing
SLR stands for “Simple Left-to-Right”. It improves on LR(0) with one change (shown below in bold):
- Reduce by if contains and
- Shift if contains (i.e. has a transition on ).
This ensures we don’t do the reduction if we know we’d get after the reduction, which is , cannot be further reduced.
If there are still conflicts under these rules, we say the grammar is not SLR.
Remark
- Even though the SLR algorithm does a sort of “one-step lookahead”, we still call it an LR(0) algorithm because the DFA states still only contain LR(0) items.
- Many grammars are not SLR.
- In practice, we can use precedence declarations and rewrite our grammar into a form that does not have conflicts (i.e. is SLR).
SLR Parsing Algorithm
- Let be the DFA for viable prefixes of .
- Let be the initial configuration.
- Repeat until configuration is :
- Let be the current configuration.
- Run on current stack .
- If rejects , report parsing error.
- (Because this means is not a viable prefix.)
- Else let be the next input symbol.
- Use SLR heuristics to decide whether to shift or reduce.
Note: We can improve the performance by caching the results of the DFA runs.
Action Table
The SLR parsing algorithm uses an action table to decide whether to shift or reduce.
The action table is a 2D table with rows for states and columns for input symbols.
For each DFA state and input symbol , we have an entry in the table action[s, t]
that is either:
shift s'
(where is the state leads to on input )reduce
byaccept
(if is the accepting state and is the end-of-input marker)error
otherwise
LR(1) Parsing
The LR(1) parsing algorithm is similar to the SLR parsing algorithm, but its DFA states contain LR(1) items instead of LR(0) items.
The idea is to build lookahead into the items. An LR(1) item is a pair of an LR(0) item and a lookahead symbol.
For example, [T -> . int * T, $]
means: after seeing T -> int * T
, reduce by T -> int * T
if the next input is $
.
The LR(1) algorithm is not covered in detail in this class.
Remarks
Bottom-up parsers are more efficient than top-down parsers, but top-down parsers are easier to implement. This is why the latter are becoming more common, especially for big languages like C++. Tools like Bison and Flex that generate bottom-up parsers are still useful for small languages, like the one we are using in this class.
Footnotes
-
A CFG is LL(1) if for every nonterminal and every terminal , there is at most one production such that . ↩
-
The transition takes care of the situation where we have seen and we’re waiting to see . ↩
-
This takes care of the same situation where we are hoping to see , but it’s also fine to get parts of a production that will eventually reduce to . ↩
-
These two transitions take care of the situation from the start state where it’s fine to not directly get a , but get things that can reduce to . ↩