Introduction to Shift-Reduce Parsing

Last Update Time-stamp: "97/06/30 13:50:16 umrigar"

This is a brief intuitive introduction to shift-reduce bottom-up parsing. Any compiler text should provide more details. A elementary introduction to grammars and language analysis is also available.

A shift-reduce parser uses a parse stack which (conceptually) contains grammar symbols. During the operation of the parser, symbols from the input are shifted onto the stack. If a prefix of the symbols on top of the stack matches the RHS of a grammar rule which is the correct rule to use within the current context, then the parser reduces the RHS of the rule to its LHS, replacing the RHS symbols on top of the stack with the nonterminal occurring on the LHS of the rule. This shift-reduce process continues until the parser terminates, reporting either success or failure. It terminates with success when the input is legal and is accepted by the parser. It terminates with failure if an error is detected in the input.

The parser is nothing but a stack automaton which may be in one of several discrete states. A state is usually represented simply as an integer. In reality, the parse stack contains states, rather than grammar symbols. However, since each state corresponds to a unique grammar symbol, the state stack can be mapped onto the grammar symbol stack mentioned earlier.

The operation of the parser is controlled by a couple of tables:

Action Table

The action table is a table with rows indexed by states and columns indexed by terminal symbols. When the parser is in some state s and the current lookahead terminal is t, the action taken by the parser depends on the contents of action[s][t], which can contain four different kinds of entries:

Shift s': Shift state s' onto the parse stack.
Reduce r: Reduce by rule r. This is explained in more detail below.
Accept: Terminate the parse with success, accepting the input.
Error: Signal a parse error.

Goto Table

The goto table is a table with rows indexed by states and columns indexed by nonterminal symbols. When the parser is in state s immediately after reducing by rule N, then the next state to enter is given by goto[s][N].

The current state of a shift-reduce parser is the state on top of the state stack. The detailed operation of such a parser is as follows:

Initialize the parse stack to contain a single state s₀, where s₀ is the distinguished initial state of the parser.
Use the state s on top of the parse stack and the current lookahead t to consult the action table entry action[s][t]:
- If the action table entry is shift s' then push state s' onto the stack and advance the input so that the lookahead is set to the next token.
- If the action table entry is reduce r and rule r has m symbols in its RHS, then pop m symbols off the parse stack. Let s' be the state now revealed on top of the parse stack and N be the LHS nonterminal for rule r. Then consult the goto table and push the state given by goto[s'][N] onto the stack. The lookahead token is not changed by this step.
- If the action table entry is accept, then terminate the parse with success.
- If the action table entry is error, then signal an error.
Repeat step (2) until the parser terminates.

For example, consider the following simple grammar

0)	$S:	stmt <EOF> 
1)	stmt:	ID ':=' expr 
2)	expr:	expr '+' ID 
3)	expr:	expr '-' ID 
4)	expr:	ID

which describes assignment statements like a:= b + c - d. (Rule 0 is a special augmenting production added to the grammar).

One possible set of shift-reduce parsing tables is shown below (sn denotes shift n, rn denotes reduce n, acc denotes accept and blank entries denote error entries):

Parser Tables
	Action Table					Goto Table
	`ID`	`':='`	`'+'`	`'-'`	`<EOF>`	`stmt`	`expr`
0	`s1`					`g2`
1		`s3`
2					`s4`
3	`s5`						`g6`
4	acc	acc	acc	acc	acc
5	`r4`	`r4`	`r4`	`r4`	`r4`
6	`r1`	`r1`	`s7`	`s8`	`r1`
7	`s9`
8	`s10`
9	`r2`	`r2`	`r2`	`r2`	`r2`
10	`r3`	`r3`	`r3`	`r3`	`r3`

A trace of the parser on the input a:= b + c - d is shown below:

Stack Remaining Input Action

0/$S a:= b + c - d s1

0/$S 1/a := b + c - d s3

0/$S 1/a 3/:= b + c - d s5

0/$S 1/a 3/:= 5/b + c - d r4

0/$S 1/a 3/:= + c - d g6 on expr

0/$S 1/a 3/:= 6/expr + c - d s7

0/$S 1/a 3/:= 6/expr 7/+ c - d s9

0/$S 1/a 3/:= 6/expr 7/+ 9/c - d r2

0/$S 1/a 3/:= - d g6 on expr

0/$S 1/a 3/:= 6/expr - d s8

0/$S 1/a 3/:= 6/expr 8/- d s10

0/$S 1/a 3/:= 6/expr 8/- 10/d <EOF> r3

0/$S 1/a 3/:= <EOF> g6 on expr

0/$S 1/a 3/:= 6/expr <EOF> r1

0/$S <EOF> g2 on stmt

0/$S 2/stmt <EOF> s4

0/$S 2/stmt 4/<EOF> accept

Stack	Remaining Input	Action
`0/$S`	`a:= b + c - d`	`s1`
`0/$S 1/a`	`:= b + c - d`	`s3`
`0/$S 1/a 3/:=`	`b + c - d`	`s5`
`0/$S 1/a 3/:= 5/b`	`+ c - d`	`r4`
`0/$S 1/a 3/:=`	`+ c - d`	`g6` on `expr`
`0/$S 1/a 3/:= 6/expr`	`+ c - d`	`s7`
`0/$S 1/a 3/:= 6/expr 7/+`	`c - d`	`s9`
`0/$S 1/a 3/:= 6/expr 7/+ 9/c`	`- d`	`r2`
`0/$S 1/a 3/:=`	`- d`	`g6` on `expr`
`0/$S 1/a 3/:= 6/expr`	`- d`	`s8`
`0/$S 1/a 3/:= 6/expr 8/-`	`d`	`s10`
`0/$S 1/a 3/:= 6/expr 8/- 10/d`	`<EOF>`	`r3`
`0/$S 1/a 3/:=`	`<EOF>`	`g6` on `expr`
`0/$S 1/a 3/:= 6/expr`	`<EOF>`	`r1`
`0/$S`	`<EOF>`	`g2` on `stmt`
`0/$S 2/stmt`	`<EOF>`	`s4`
`0/$S 2/stmt 4/<EOF>`		accept

Each stack entry is shown as a state number followed by the symbol which caused the transition to that state.

Construction of Shift-Reduce Parsing Tables

The general idea of bottom-up parsing is to repeatedly match the RHS of some rule and reduce it to the rule's LHS. To identify the matching RHS's, the parser needs to keep track of all possible rules which may match. This is done by means of the parser state, where each state keeps track of the set of rules the parser may currently be in, and how far along the parser may be within each rule. This idea of states will become clearer if we attempt to build the tables for a small example.

Consider the grammar used in the previous section, repeated below for convenience:

0)	$S:	stmt <EOF> 
1)	stmt:	ID ':=' expr 
2)	expr:	expr '+' ID 
3)	expr:	expr '-' ID 
4)	expr:	ID

The input must be ultimately reducible to the augmenting nonterminal $S. Hence the parser should initially be in rule 0; more specifically, it should be expecting the stmt in rule 0. To show precisely which symbol is expected in a rule RHS, we define an item to be a rule, along with a position on the RHS specifying the next symbol to be expected in that RHS. We denote an item as a rule with a dot . just before the next expected symbol. Hence, returning to our example, the parser is initially expecting the item

0)	$S:	. stmt <EOF>

However, if the parser is expecting to see a stmt, it could be at the beginning of any of the rules for stmt. Hence the initial state should include the initial item for stmt. (The process of including these additional induced items is referred to as forming the closure of the state).

State 0
0)	$S:	. stmt <EOF> 
1)	stmt:	. ID ':=' expr

Now if the parser sees an ID in state 0, then it can move the dot past any ID symbols in state 0. We get a new state; let's call it State 1:

State 1
1)	stmt:	ID . ':=' expr

If the parser has seen a stmt in state 0, then it can move the dot past any stmt symbols in state 0. We get a new state; let's call it State 2:

State 2
0)	$S:	stmt . <EOF>

If the parser sees a ':=' in state 1, we get a new state; let's call it State 3:

State 3
1)	stmt:	ID ':=' . expr

However since the dot is before the nonterminal expr, the parser could be in any of the rules for expr. Hence we need to include the rules for expr in a new state 3:

State 3
1)	stmt:	ID ':=' . expr 
2)	expr:	. expr '+' ID 
3)	expr:	. expr '-' ID 
4)	expr:   . ID

We continue this process of following all possible transitions out of states until we cannot construct any new states. Completing this construction results in an automaton called a LR(0) machine. The transitions on terminal symbols correspond to shift actions in the parser; the transitions on nonterminal symbols correspond to goto actions in the parser.

Note that the construction guarantees that each state is entered by a unique grammar symbol; that is why we can map a state stack into a symbol stack as mentioned earlier.

For our example grammar, we get the following LR(0) machine:

State 0   	
0)	$S: . stmt  
1)	stmt: . ID ':=' expr 
	GOTO   2 on stmt
	SHIFT 1 on ID

State 1   	
1)	stmt: ID . ':=' expr 
	SHIFT 3 on ':='

State 2   	
0)	$S: stmt .  
	SHIFT 4 on 

State 3   	
1)	stmt: ID ':=' . expr 
2)	expr: . expr '+' ID 
3)	expr: . expr '-' ID 
4)	expr: . ID 
	GOTO   6 on expr
        SHIFT 5 on ID

State 4   	
0)	$S: stmt  . 

State 5   	
4)	expr: ID . 

State 6   	
1)	stmt: ID ':=' expr . 
2)	expr: expr . '+' ID 
3)	expr: expr . '-' ID 
        SHIFT 7 on '+'
        SHIFT 8 on '-'

State 7   	
2)	expr: expr '+' . ID 
        SHIFT 9 on ID

State 8   	
3)	expr: expr '-' . ID 
        SHIFT 10 on ID

State 9   	
2)	expr: expr '+' ID . 

State 10  
3)	expr: expr '-' ID .

The LR(0) machine defines the shift and goto actions in our parse tables. But what corresponds to the reduce actions in the action table?

We should reduce by a rule when that rule has been completely recognized; i.e., when the state contains a reducing item with the dot at the extreme right. However, if a state has reducing items, under what conditions should the corresponding reductions be applied?

If there are no other shift or reducing items in the state, then whenever that state is entered, it is correct to always apply the reduction corresponding to the single reducing item. Such a state is referred to as an LR(0) state.
If there are other shift or reducing items in the state, then the current lookahead token should be used to make the decision. The exact way this decision is made depends on the precise flavor of shift-reduce parser being constructed:

SLR(1) Parser Such a parser will reduce by the rule corresponding to a reducing item in a state, whenever the current lookahead is among the symbols which may follow the rule's LHS nonterminal anywhere in the grammar.
LALR(1) Parser Such a parser will reduce by the rule corresponding to a reducing item in a state, whenever the current lookahead is among the symbols which may follow the rule's LHS nonterminal only in the context of that state . These kinds of LALR(1) parsers are usually generated by popular parser generators like yacc, bison and zyacc.
LR(1) Parser Instead of constructing a LR(0) machine, a LR(1) machine is constructed, where lookahead information forms part of each item. In this case a reducing item triggers a reduction by the corresponding rule only when the current lookahead is in the lookahead set computed for that item. Such a parser is referred to as an LR(1) parser.

It is worth emphasizing that the only essential difference between SLR(1), LALR(1) and LR(1) parsers is how the decision is made as to when a reduction should be applied. If multiple actions are possible for a particular state and lookahead, then that state has conflicts. The parsers are listed above in order of increasing precision of the information used for the reduce decision; hence it is possible that a SLR(1) parser has conflicts when a LALR(1) parser has none, and a LALR(1) parser has conflicts when a LR(1) parser has none.

Parse tables can often be represented more compactly by designating a particular reduction to be the default reduction for a state. This means that if no other action is possible in a state, the default reduction for that state is applied. This explains why in the example, many rows of the table have reduce entries in almost all their columns.

Default reductions do not affect the correctness of the parse. Their only effect is on error detection: an error may not be detected until after several (useless) reductions have been applied. However, the error is still detected before any additional symbols are shifted onto the parse stack.

Feedback: Please email any feedback to zdu@acm.org.

Up to Parsing Demos Main Page