LR parsers

Types of LR parsers

LR(0) parsers

LR(0) parsers are a basic type of LR parser that uses zero lookahead, i.e., makes decisions about branches and reductions without analyzing the next characters of the input string.

LR(0) parsers are the basis for understanding more complex parsers. They use a stack to store state and a queue to store input characters.

Because they do not use lookahead, their ability to resolve ambiguities in grammar is limited.

The main problem with LR(0) parsers is shift/reduce and reduce/reduce conflicts, which can arise due to lack of information about the following characters.

Let's look at the grammar:

S → A
A → aA | b

For the string “aab”, LR(0) the parser will perform the following steps:

  1. Shift: move a on the stack.

  2. Shift: move second a on the stack.

  3. Reduce: transformation aA V A.

  4. Shift: move b on the stack.

  5. Reduce: transformation A V S.

SLR (Simple LR) parsers

SLR parsers are an update to LR(0) parsers that use single-character lookaheads to resolve ambiguities.

SLR parsers resolve many of the conflicts present in LR(0) parsers by using additional transition rules.

The transition and action tables for SLR parsers are smaller than those for more complex LR(1) parsers, which makes them more memory efficient.

For the same grammar:

S → A
A → aA | b

The SLR parser will use lookahead to determine which rule to apply in case of ambiguity.

LALR (Look-Ahead LR) parsers

LALR parsers are another step in the evolution of LR parsers, representing a compromise between power and efficiency.

LALR parsers are capable of processing most of the grammars that full LR(1) parsers can handle, but with less memory overhead.

LALR parser tables are more compact than full LR(1) parser tables.

Let's look at the grammar:

S → E
E → E + T | T
T → T * F | F
F → ( E ) | id

The LALR parser will use lookahead to parse expressions correctly, resolving ambiguities using compact jump tables.

Canonical LR(1) parsers

Canonical LR(1) parsers are the most powerful type of LR parser, capable of processing any context-free grammar without restrictions.

These parsers can handle any grammar that any LR parser technique can handle.

The transition and action tables for LR(1) parsers are significantly larger than those for other types of LR parsers.

For the same grammar:

S → E
E → E + T | T
T → T * F | F
F → ( E ) | id

The Canonical LR(1) parser will create complete transition and action tables for every possible combination of states and characters, ensuring the most accurate parsing possible.

What do LR parsers consist of and the algorithm for their work

Main components of an LR parser

  1. Stack:
    The stack is used to store the states and symbols of the grammar. During parsing, the parser moves symbols and states between the stack and the input buffer.

  2. Input buffer:
    The input buffer contains a string that needs to be parsed. The line ends with a special end character ($).

  3. Action table:
    The action table specifies what the parser should do (shift, reduce, accept, or error) based on the current state and the symbol at the top of the stack.

  4. Goto table:
    The transition table determines the new state that the parser should go to after performing a reduce operation, based on the current state and the symbol to which the reduction was applied.

Algorithm execution steps

The LR parser algorithm consists of repeated steps, including shift and reduce operations:

  1. Shift:

    • Moves the next character from the input buffer onto the stack.

    • Update the parser state in accordance with the action table.

  2. Reduce:

    • Applying a grammar rule to symbols at the top of the stack.

    • Replacing the right side of a rule with the left side in the stack.

    • Transition to a new state according to the transition table.

Let's look at an example grammar and the execution of parsing steps:

Grammar:

E → E + T | T
T → T * F | F
F → ( E ) | id

Input line: id + id * id

Execution steps:

  1. Shift: id is moved onto the stack, the state is updated.

  2. Reduce: id converted to Fthen F V T.

  3. Shift: + moves onto the stack.

  4. Shift: next id is moved onto the stack, the state is updated.

  5. Reduce: id converted to Fthen F V TAnd T V E.

  6. Shift: * moves onto the stack.

  7. Shift: next id is moved onto the stack, the state is updated.

  8. Reduce: id converted to Fthen F V T.

  9. Reduce: T * F converted to T.

  10. Reduce: E + T converted to E.

  11. Accept: string parsed successfully.

Code examples

LR(0) parser

class LR0Parser:
    def __init__(self, grammar, parsing_table):
        self.grammar = grammar
        self.parsing_table = parsing_table

    def parse(self, input_string):
        stack = [0]
        input_string = input_string.split() + ['$']
        cursor = 0

        while True:
            state = stack[-1]
            lookahead = input_string[cursor]
            action = self.parsing_table[state][lookahead]

            if action.startswith('s'):
                stack.append(lookahead)
                stack.append(int(action[1:]))
                cursor += 1
            elif action.startswith('r'):
                production = self.grammar[int(action[1:])]
                for _ in range(2 * len(production[1])):
                    stack.pop()
                state = stack[-1]
                stack.append(production[0])
                stack.append(self.parsing_table[state][production[0]])
            elif action == 'accept':
                print("Parsing completed successfully!")
                return True
            else:
                print("Error: Invalid syntax")
                return False

grammar = [
    ('S', ['A']),
    ('A', ['a', 'A']),
    ('A', ['b'])
]

parsing_table = {
    0: {'a': 's2', 'b': 's3', 'S': 1, 'A': 4},
    2: {'a': 's2', 'b': 's3', 'A': 5},
    3: {'$': 'r2'},
    4: {'$': 'accept'},
    5: {'a': 's2', 'b': 's3', 'A': 6},
    6: {'$': 'r1'}
}

parser = LR0Parser(grammar, parsing_table)
parser.parse('a a b')

SLR parser

class SLRParser:
    def __init__(self, grammar, parsing_table, follow_set):
        self.grammar = grammar
        self.parsing_table = parsing_table
        self.follow_set = follow_set

    def parse(self, input_string):
        stack = [0]
        input_string = input_string.split() + ['$']
        cursor = 0

        while True:
            state = stack[-1]
            lookahead = input_string[cursor]
            action = self.parsing_table[state].get(lookahead, None)

            if action is None:
                print("Error: Invalid syntax")
                return False
            elif action.startswith('s'):
                stack.append(lookahead)
                stack.append(int(action[1:]))
                cursor += 1
            elif action.startswith('r'):
                production = self.grammar[int(action[1:])]
                for _ in range(2 * len(production[1])):
                    stack.pop()
                state = stack[-1]
                stack.append(production[0])
                stack.append(self.parsing_table[state][production[0]])
            elif action == 'accept':
                print("Parsing completed successfully!")
                return True
            else:
                print("Error: Invalid syntax")
                return False

grammar = [
    ('S', ['E']),
    ('E', ['E', '+', 'T']),
    ('E', ['T']),
    ('T', ['T', '*', 'F']),
    ('T', ['F']),
    ('F', ['(', 'E', ')']),
    ('F', ['id'])
]

parsing_table = {
    0: {'id': 's5', '(': 's4', 'E': 1, 'T': 2, 'F': 3},
    1: {'+': 's6', '$': 'accept'},
    2: {'+': 'r2', '*': 's7', '$': 'r2'},
    3: {'+': 'r4', '*': 'r4', '$': 'r4'},
    4: {'id': 's5', '(': 's4', 'E': 8, 'T': 2, 'F': 3},
    5: {'+': 'r6', '*': 'r6', '$': 'r6'},
    6: {'id': 's5', '(': 's4', 'T': 9, 'F': 3},
    7: {'id': 's5', '(': 's4', 'F': 10},
    8: {')': 's11', '+': 's6'},
    9: {'+': 'r1', '*': 's7', '$': 'r1'},
    10: {'+': 'r3', '*': 'r3', '$': 'r3'},
    11: {'+': 'r5', '*': 'r5', '$': 'r5'}
}

follow_set = {
    'S': ['$'],
    'E': ['$', '+', ')'],
    'T': ['$', '+', '*', ')'],
    'F': ['$', '+', '*', ')']
}

parser = SLRParser(grammar, parsing_table, follow_set)
parser.parse('id + id * id')

LALR parser

class LALRParser:
    def __init__(self, grammar, parsing_table):
        self.grammar = grammar
        self.parsing_table = parsing_table

    def parse(self, input_string):
        stack = [0]
        input_string = input_string.split() + ['$']
        cursor = 0

        while True:
            state = stack[-1]
            lookahead = input_string[cursor]
            action = self.parsing_table[state].get(lookahead, None)

            if action is None:
                print("Error: Invalid syntax")
                return False
            elif action.startswith('s'):
                stack.append(lookahead)
                stack.append(int(action[1:]))
                cursor += 1
            elif action.startswith('r'):
                production = self.grammar[int(action[1:])]
                for _ in range(2 * len(production[1])):
                    stack.pop()
                state = stack[-1]
                stack.append(production[0])
                stack.append(self.parsing_table[state][production[0]])
            elif action == 'accept':
                print("Parsing completed successfully!")
                return True
            else:
                print("Error: Invalid syntax")
                return False

grammar = [
    ('E', ['E', '+', 'T']),
    ('E', ['T']),
    ('T', ['T', '*', 'F']),
    ('T', ['F']),
    ('F', ['(', 'E', ')']),
    ('F', ['id'])
]

parsing_table = {
    0: {'id': 's5', '(': 's4', 'E': 1, 'T': 2, 'F': 3},
    1: {'+': 's6', '$': 'accept'},
    2: {'+': 'r2', '*': 's7', '$': 'r2'},
    3: {'+': 'r4', '*': 'r4', '$': 'r4'},
    4: {'id': 's5', '(': 's4', 'E': 8, 'T': 2, 'F': 3},
    5: {'+': 'r6', '*': 'r6', '$': 'r6'},
    6: {'id': 's5', '(': 's4', 'T': 9, 'F': 3},
    7: {'id': 's5', '(': 's4', 'F': 10},
    8: {')': 's11', '+': 's6'},
    9: {'+': 'r1', '*': 's7', '$': 'r1'},
    10: {'+': 'r3', '*': 'r3', '$': 'r3'},
    11: {'+': 'r5', '*': 'r5', '$': 'r5'}
}

parser = LALRParser(grammar, parsing_table)
parser.parse('id + id * id')

Canonical LR(1) parser

class LR1Parser:
    def __init__(self, grammar, parsing_table):
        self.grammar = grammar
        self.parsing_table = parsing_table

    def parse(self, input_string):
        stack = [0]
        input_string = input_string.split() + ['$']
        cursor = 0

        while True:
            state = stack[-1]
            lookahead = input_string[cursor]
            action = self.parsing_table[state].get(lookahead, None)

            if action is None:
                print("Error: Invalid syntax")
                return False
            elif action.startswith('s'):
                stack.append(lookahead)
                stack.append(int(action[1:]))
                cursor += 1
            elif action.startswith('r'):
                production = self.grammar[int(action[1:])]
                for _ in range(2 * len(production[1])):
                    stack.pop()
                state = stack[-1]
                stack.append(production[0])
                stack.append(self.parsing_table[state][production[0]])
            elif action == 'accept':
                print("Parsing completed successfully!")
                return True
            else:
                print("Error: Invalid syntax")
                return False

grammar = [
    ('E', ['E', '+', 'T']),
    ('E', ['T']),
    ('T', ['T', '*', 'F']),
    ('T', ['F']),
    ('F', ['(', 'E', ')']),
    ('F', ['id'])
]

parsing_table = {
    0: {'id': 's5', '(': 's4', 'E': 1, 'T': 2, 'F': 3},
    1: {'+': 's6', '$': 'accept'},
    2: {'+': 'r2', '*': 's7', '$': 'r2'},
    3: {'+': 'r4', '*': 'r4', '$': 'r4'},
    4: {'id': 's5', '(': 's4', 'E': 8, 'T': 2, 'F': 3},
    5: {'+': 'r6', '*': 'r6', '$': 'r6'},
    6: {'id': 's5', '(': 's4', 'T': 9, 'F': 3},
    7: {'id': 's5', '(': 's4', 'F': 10},
    8: {')': 's11', '+': 's6'},
    9: {'+': 'r1', '*': 's7', '$': 'r1'},
    10: {'+': 'r3', '*': 'r3', '$': 'r3'},
    11: {'+': 'r5', '*': 'r5', '$': 'r5'}
}

parser = LR1Parser(grammar, parsing_table)
parser.parse('id + id * id'

Each of the parsers processes the string following the rules of grammar, and performs shift and reduce operations to build a parse tree or detect errors in syntax.


In conclusion, let me remind you about the upcoming open lessons:

  • June 13, “Stages, tasks and types of design”: we’ll look at how to correctly determine the approach to designing an information system and software. Sign up via link

  • June 19, “Event storming as a tool for studying the subject area.” Sign up via link

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *