LR parsers
Types of LR parsers
LR(0) parsers
LR(0) parsers are a basic type of LR parser that uses zero lookahead, i.e., makes decisions about branches and reductions without analyzing the next characters of the input string.
LR(0) parsers are the basis for understanding more complex parsers. They use a stack to store state and a queue to store input characters.
Because they do not use lookahead, their ability to resolve ambiguities in grammar is limited.
The main problem with LR(0) parsers is shift/reduce and reduce/reduce conflicts, which can arise due to lack of information about the following characters.
Let's look at the grammar:
S → A
A → aA | b
For the string “aab”, LR(0) the parser will perform the following steps:
Shift: move
a
on the stack.Shift: move second
a
on the stack.Reduce: transformation
aA
VA
.Shift: move
b
on the stack.Reduce: transformation
A
VS
.
SLR (Simple LR) parsers
SLR parsers are an update to LR(0) parsers that use single-character lookaheads to resolve ambiguities.
SLR parsers resolve many of the conflicts present in LR(0) parsers by using additional transition rules.
The transition and action tables for SLR parsers are smaller than those for more complex LR(1) parsers, which makes them more memory efficient.
For the same grammar:
S → A
A → aA | b
The SLR parser will use lookahead to determine which rule to apply in case of ambiguity.
LALR (Look-Ahead LR) parsers
LALR parsers are another step in the evolution of LR parsers, representing a compromise between power and efficiency.
LALR parsers are capable of processing most of the grammars that full LR(1) parsers can handle, but with less memory overhead.
LALR parser tables are more compact than full LR(1) parser tables.
Let's look at the grammar:
S → E
E → E + T | T
T → T * F | F
F → ( E ) | id
The LALR parser will use lookahead to parse expressions correctly, resolving ambiguities using compact jump tables.
Canonical LR(1) parsers
Canonical LR(1) parsers are the most powerful type of LR parser, capable of processing any context-free grammar without restrictions.
These parsers can handle any grammar that any LR parser technique can handle.
The transition and action tables for LR(1) parsers are significantly larger than those for other types of LR parsers.
For the same grammar:
S → E
E → E + T | T
T → T * F | F
F → ( E ) | id
The Canonical LR(1) parser will create complete transition and action tables for every possible combination of states and characters, ensuring the most accurate parsing possible.
What do LR parsers consist of and the algorithm for their work
Main components of an LR parser
Stack:
The stack is used to store the states and symbols of the grammar. During parsing, the parser moves symbols and states between the stack and the input buffer.Input buffer:
The input buffer contains a string that needs to be parsed. The line ends with a special end character ($).Action table:
The action table specifies what the parser should do (shift, reduce, accept, or error) based on the current state and the symbol at the top of the stack.Goto table:
The transition table determines the new state that the parser should go to after performing a reduce operation, based on the current state and the symbol to which the reduction was applied.
Algorithm execution steps
The LR parser algorithm consists of repeated steps, including shift and reduce operations:
Shift:
Moves the next character from the input buffer onto the stack.
Update the parser state in accordance with the action table.
Reduce:
Applying a grammar rule to symbols at the top of the stack.
Replacing the right side of a rule with the left side in the stack.
Transition to a new state according to the transition table.
Let's look at an example grammar and the execution of parsing steps:
Grammar:
E → E + T | T
T → T * F | F
F → ( E ) | id
Input line: id + id * id
Execution steps:
Shift:
id
is moved onto the stack, the state is updated.Reduce:
id
converted toF
thenF
VT
.Shift:
+
moves onto the stack.Shift: next
id
is moved onto the stack, the state is updated.Reduce:
id
converted toF
thenF
VT
AndT
VE
.Shift:
*
moves onto the stack.Shift: next
id
is moved onto the stack, the state is updated.Reduce:
id
converted toF
thenF
VT
.Reduce:
T * F
converted toT
.Reduce:
E + T
converted toE
.Accept: string parsed successfully.
Code examples
LR(0) parser
class LR0Parser:
def __init__(self, grammar, parsing_table):
self.grammar = grammar
self.parsing_table = parsing_table
def parse(self, input_string):
stack = [0]
input_string = input_string.split() + ['$']
cursor = 0
while True:
state = stack[-1]
lookahead = input_string[cursor]
action = self.parsing_table[state][lookahead]
if action.startswith('s'):
stack.append(lookahead)
stack.append(int(action[1:]))
cursor += 1
elif action.startswith('r'):
production = self.grammar[int(action[1:])]
for _ in range(2 * len(production[1])):
stack.pop()
state = stack[-1]
stack.append(production[0])
stack.append(self.parsing_table[state][production[0]])
elif action == 'accept':
print("Parsing completed successfully!")
return True
else:
print("Error: Invalid syntax")
return False
grammar = [
('S', ['A']),
('A', ['a', 'A']),
('A', ['b'])
]
parsing_table = {
0: {'a': 's2', 'b': 's3', 'S': 1, 'A': 4},
2: {'a': 's2', 'b': 's3', 'A': 5},
3: {'$': 'r2'},
4: {'$': 'accept'},
5: {'a': 's2', 'b': 's3', 'A': 6},
6: {'$': 'r1'}
}
parser = LR0Parser(grammar, parsing_table)
parser.parse('a a b')
SLR parser
class SLRParser:
def __init__(self, grammar, parsing_table, follow_set):
self.grammar = grammar
self.parsing_table = parsing_table
self.follow_set = follow_set
def parse(self, input_string):
stack = [0]
input_string = input_string.split() + ['$']
cursor = 0
while True:
state = stack[-1]
lookahead = input_string[cursor]
action = self.parsing_table[state].get(lookahead, None)
if action is None:
print("Error: Invalid syntax")
return False
elif action.startswith('s'):
stack.append(lookahead)
stack.append(int(action[1:]))
cursor += 1
elif action.startswith('r'):
production = self.grammar[int(action[1:])]
for _ in range(2 * len(production[1])):
stack.pop()
state = stack[-1]
stack.append(production[0])
stack.append(self.parsing_table[state][production[0]])
elif action == 'accept':
print("Parsing completed successfully!")
return True
else:
print("Error: Invalid syntax")
return False
grammar = [
('S', ['E']),
('E', ['E', '+', 'T']),
('E', ['T']),
('T', ['T', '*', 'F']),
('T', ['F']),
('F', ['(', 'E', ')']),
('F', ['id'])
]
parsing_table = {
0: {'id': 's5', '(': 's4', 'E': 1, 'T': 2, 'F': 3},
1: {'+': 's6', '$': 'accept'},
2: {'+': 'r2', '*': 's7', '$': 'r2'},
3: {'+': 'r4', '*': 'r4', '$': 'r4'},
4: {'id': 's5', '(': 's4', 'E': 8, 'T': 2, 'F': 3},
5: {'+': 'r6', '*': 'r6', '$': 'r6'},
6: {'id': 's5', '(': 's4', 'T': 9, 'F': 3},
7: {'id': 's5', '(': 's4', 'F': 10},
8: {')': 's11', '+': 's6'},
9: {'+': 'r1', '*': 's7', '$': 'r1'},
10: {'+': 'r3', '*': 'r3', '$': 'r3'},
11: {'+': 'r5', '*': 'r5', '$': 'r5'}
}
follow_set = {
'S': ['$'],
'E': ['$', '+', ')'],
'T': ['$', '+', '*', ')'],
'F': ['$', '+', '*', ')']
}
parser = SLRParser(grammar, parsing_table, follow_set)
parser.parse('id + id * id')
LALR parser
class LALRParser:
def __init__(self, grammar, parsing_table):
self.grammar = grammar
self.parsing_table = parsing_table
def parse(self, input_string):
stack = [0]
input_string = input_string.split() + ['$']
cursor = 0
while True:
state = stack[-1]
lookahead = input_string[cursor]
action = self.parsing_table[state].get(lookahead, None)
if action is None:
print("Error: Invalid syntax")
return False
elif action.startswith('s'):
stack.append(lookahead)
stack.append(int(action[1:]))
cursor += 1
elif action.startswith('r'):
production = self.grammar[int(action[1:])]
for _ in range(2 * len(production[1])):
stack.pop()
state = stack[-1]
stack.append(production[0])
stack.append(self.parsing_table[state][production[0]])
elif action == 'accept':
print("Parsing completed successfully!")
return True
else:
print("Error: Invalid syntax")
return False
grammar = [
('E', ['E', '+', 'T']),
('E', ['T']),
('T', ['T', '*', 'F']),
('T', ['F']),
('F', ['(', 'E', ')']),
('F', ['id'])
]
parsing_table = {
0: {'id': 's5', '(': 's4', 'E': 1, 'T': 2, 'F': 3},
1: {'+': 's6', '$': 'accept'},
2: {'+': 'r2', '*': 's7', '$': 'r2'},
3: {'+': 'r4', '*': 'r4', '$': 'r4'},
4: {'id': 's5', '(': 's4', 'E': 8, 'T': 2, 'F': 3},
5: {'+': 'r6', '*': 'r6', '$': 'r6'},
6: {'id': 's5', '(': 's4', 'T': 9, 'F': 3},
7: {'id': 's5', '(': 's4', 'F': 10},
8: {')': 's11', '+': 's6'},
9: {'+': 'r1', '*': 's7', '$': 'r1'},
10: {'+': 'r3', '*': 'r3', '$': 'r3'},
11: {'+': 'r5', '*': 'r5', '$': 'r5'}
}
parser = LALRParser(grammar, parsing_table)
parser.parse('id + id * id')
Canonical LR(1) parser
class LR1Parser:
def __init__(self, grammar, parsing_table):
self.grammar = grammar
self.parsing_table = parsing_table
def parse(self, input_string):
stack = [0]
input_string = input_string.split() + ['$']
cursor = 0
while True:
state = stack[-1]
lookahead = input_string[cursor]
action = self.parsing_table[state].get(lookahead, None)
if action is None:
print("Error: Invalid syntax")
return False
elif action.startswith('s'):
stack.append(lookahead)
stack.append(int(action[1:]))
cursor += 1
elif action.startswith('r'):
production = self.grammar[int(action[1:])]
for _ in range(2 * len(production[1])):
stack.pop()
state = stack[-1]
stack.append(production[0])
stack.append(self.parsing_table[state][production[0]])
elif action == 'accept':
print("Parsing completed successfully!")
return True
else:
print("Error: Invalid syntax")
return False
grammar = [
('E', ['E', '+', 'T']),
('E', ['T']),
('T', ['T', '*', 'F']),
('T', ['F']),
('F', ['(', 'E', ')']),
('F', ['id'])
]
parsing_table = {
0: {'id': 's5', '(': 's4', 'E': 1, 'T': 2, 'F': 3},
1: {'+': 's6', '$': 'accept'},
2: {'+': 'r2', '*': 's7', '$': 'r2'},
3: {'+': 'r4', '*': 'r4', '$': 'r4'},
4: {'id': 's5', '(': 's4', 'E': 8, 'T': 2, 'F': 3},
5: {'+': 'r6', '*': 'r6', '$': 'r6'},
6: {'id': 's5', '(': 's4', 'T': 9, 'F': 3},
7: {'id': 's5', '(': 's4', 'F': 10},
8: {')': 's11', '+': 's6'},
9: {'+': 'r1', '*': 's7', '$': 'r1'},
10: {'+': 'r3', '*': 'r3', '$': 'r3'},
11: {'+': 'r5', '*': 'r5', '$': 'r5'}
}
parser = LR1Parser(grammar, parsing_table)
parser.parse('id + id * id'
Each of the parsers processes the string following the rules of grammar, and performs shift and reduce operations to build a parse tree or detect errors in syntax.
In conclusion, let me remind you about the upcoming open lessons:
June 13, “Stages, tasks and types of design”: we’ll look at how to correctly determine the approach to designing an information system and software. Sign up via link
June 19, “Event storming as a tool for studying the subject area.” Sign up via link