Last Update Time-stamp: "97/06/30 13:50:16 umrigar"
This is a brief intuitive introduction to shift-reduce bottom-up parsing. Any compiler text should provide more details. A elementary introduction to grammars and language analysis is also available.
The operation of the parser is controlled by a couple of tables:
[
s][
t]
,
which can contain four different kinds of entries:
[
s][
N]
.
[
s][
t]
:
[
s'][
N]
onto the stack. The lookahead token is not changed by this step.
For example, consider the following simple grammar
0) $S: stmt <EOF> 1) stmt: ID ':=' expr 2) expr: expr '+' ID 3) expr: expr '-' ID 4) expr: IDwhich describes assignment statements like
a:= b + c - d
. (Rule
0 is a special augmenting production added to the grammar).
One possible set of shift-reduce parsing tables is shown below
(s
n denotes shift n,
r
n denotes reduce n,
acc denotes accept and blank entries denote error
entries):
Action Table | Goto Table | ||||||
---|---|---|---|---|---|---|---|
ID |
':=' |
'+' |
'-' |
<EOF> |
stmt |
expr |
|
0 | s1 |
g2 |
|||||
1 | s3 |
||||||
2 | s4 |
||||||
3 | s5 |
g6 |
|||||
4 | acc | acc | acc | acc | acc | ||
5 | r4 |
r4 |
r4 |
r4 |
r4 |
||
6 | r1 |
r1 |
s7 |
s8 |
r1 |
||
7 | s9 |
||||||
8 | s10 |
||||||
9 | r2 |
r2 |
r2 |
r2 |
r2 |
||
10 | r3 |
r3 |
r3 |
r3 |
r3 |
A trace of the parser on the input a:= b + c - d
is shown
below:
Stack | Remaining Input | Action |
---|---|---|
0/$S |
a:= b + c - d |
s1 |
0/$S 1/a |
:= b + c - d |
s3 |
0/$S 1/a 3/:= |
b + c - d |
s5 |
0/$S 1/a 3/:= 5/b |
+ c - d |
r4 |
0/$S 1/a 3/:= |
+ c - d |
g6 on expr |
0/$S 1/a 3/:= 6/expr |
+ c - d |
s7 |
0/$S 1/a 3/:= 6/expr 7/+ |
c - d |
s9 |
0/$S 1/a 3/:= 6/expr 7/+ 9/c |
- d |
r2 |
0/$S 1/a 3/:= |
- d |
g6 on expr |
0/$S 1/a 3/:= 6/expr |
- d |
s8 |
0/$S 1/a 3/:= 6/expr 8/- |
d |
s10 |
0/$S 1/a 3/:= 6/expr 8/- 10/d |
<EOF> |
r3 |
0/$S 1/a 3/:= |
<EOF> |
g6 on expr |
0/$S 1/a 3/:= 6/expr |
<EOF> |
r1 |
0/$S |
<EOF> |
g2 on stmt |
0/$S 2/stmt |
<EOF> |
s4 |
0/$S 2/stmt 4/<EOF> |
|
accept |
The general idea of bottom-up parsing is to repeatedly match the RHS of some rule and reduce it to the rule's LHS. To identify the matching RHS's, the parser needs to keep track of all possible rules which may match. This is done by means of the parser state, where each state keeps track of the set of rules the parser may currently be in, and how far along the parser may be within each rule. This idea of states will become clearer if we attempt to build the tables for a small example.
Consider the grammar used in the previous section, repeated below for convenience:
0) $S: stmt <EOF> 1) stmt: ID ':=' expr 2) expr: expr '+' ID 3) expr: expr '-' ID 4) expr: IDThe input must be ultimately reducible to the augmenting nonterminal
$S
. Hence the parser should initially be in rule 0; more
specifically, it should be expecting the stmt
in rule 0. To
show precisely which symbol is expected in a rule RHS, we define an
item to be a rule, along with a position on the RHS
specifying the next symbol to be expected in that RHS. We denote an item as
a rule with a dot .
just before the next expected symbol.
Hence, returning to our example, the parser is initially expecting the item
0) $S: . stmt <EOF>However, if the parser is expecting to see a
stmt
, it could
be at the beginning of any of the rules for stmt
. Hence the
initial state should include the initial item for
stmt
. (The process of including these additional induced items
is referred to as forming the closure of the state).
State 0 0) $S: . stmt <EOF> 1) stmt: . ID ':=' exprNow if the parser sees an
ID
in state 0, then it can move the
dot past any ID
symbols in state 0. We get a new state; let's
call it State 1:
State 1 1) stmt: ID . ':=' exprIf the parser has seen a
stmt
in state 0, then it can move
the dot past any stmt
symbols in state 0. We get a new state;
let's call it State 2:
State 2 0) $S: stmt . <EOF>If the parser sees a
':='
in state 1, we get a new state; let's
call it State 3:
State 3 1) stmt: ID ':=' . exprHowever since the dot is before the nonterminal
expr
, the
parser could be in any of the rules for expr
. Hence we need to
include the rules for expr
in a new state 3:
State 3 1) stmt: ID ':=' . expr 2) expr: . expr '+' ID 3) expr: . expr '-' ID 4) expr: . IDWe continue this process of following all possible transitions out of states until we cannot construct any new states. Completing this construction results in an automaton called a LR(0) machine. The transitions on terminal symbols correspond to shift actions in the parser; the transitions on nonterminal symbols correspond to goto actions in the parser.
Note that the construction guarantees that each state is entered by a unique grammar symbol; that is why we can map a state stack into a symbol stack as mentioned earlier.
For our example grammar, we get the following LR(0) machine:
State 0 0) $S: . stmtThe LR(0) machine defines the shift and goto actions in our parse tables. But what corresponds to the reduce actions in the action table?1) stmt: . ID ':=' expr GOTO 2 on stmt SHIFT 1 on ID State 1 1) stmt: ID . ':=' expr SHIFT 3 on ':=' State 2 0) $S: stmt . SHIFT 4 on State 3 1) stmt: ID ':=' . expr 2) expr: . expr '+' ID 3) expr: . expr '-' ID 4) expr: . ID GOTO 6 on expr SHIFT 5 on ID State 4 0) $S: stmt . State 5 4) expr: ID . State 6 1) stmt: ID ':=' expr . 2) expr: expr . '+' ID 3) expr: expr . '-' ID SHIFT 7 on '+' SHIFT 8 on '-' State 7 2) expr: expr '+' . ID SHIFT 9 on ID State 8 3) expr: expr '-' . ID SHIFT 10 on ID State 9 2) expr: expr '+' ID . State 10 3) expr: expr '-' ID .
We should reduce by a rule when that rule has been completely recognized; i.e., when the state contains a reducing item with the dot at the extreme right. However, if a state has reducing items, under what conditions should the corresponding reductions be applied?
It is worth emphasizing that the only essential difference between SLR(1), LALR(1) and LR(1) parsers is how the decision is made as to when a reduction should be applied. If multiple actions are possible for a particular state and lookahead, then that state has conflicts. The parsers are listed above in order of increasing precision of the information used for the reduce decision; hence it is possible that a SLR(1) parser has conflicts when a LALR(1) parser has none, and a LALR(1) parser has conflicts when a LR(1) parser has none.
Parse tables can often be represented more compactly by designating a particular reduction to be the default reduction for a state. This means that if no other action is possible in a state, the default reduction for that state is applied. This explains why in the example, many rows of the table have reduce entries in almost all their columns.
Default reductions do not affect the correctness of the parse. Their only effect is on error detection: an error may not be detected until after several (useless) reductions have been applied. However, the error is still detected before any additional symbols are shifted onto the parse stack.
Copyright (C) 1997 Zerksis D. Umrigar
Feedback: Please email any feedback to zdu@acm.org.