what is it and how to use it

Hello, Habrites!

I'm Dima, a Python developer from 21 YARDa service for finding construction contractors.

In this article I will tell you about the pattern Interpreter. Let's figure out when to use it, what concepts underlie it. Then we'll use the pattern to write a program for solving mathematical expressions. Look for the code in the repository at link.

When to use

Interpreter are used when it is necessary to process lexical constructions with certain grammatical rules. As an example – regular expressions.

Examples of situations in which the pattern is useful:

  • when it is necessary to develop a language to describe operations or behavior;

  • when you often have to process complex expressions;

  • when the grammar of a system is frequently supplemented with new rules.

The interpreter allows you to easily add new rules and symbols to the language grammar. This makes the system flexible and extensible. The interpreter also simplifies the processing of complex expressions: it allows you to break complex expressions into simple parts and process them step by step.

Concepts at the core

The interpreter is based on three main concepts:

  • lexical processor: analyzes the expression and creates from it tokens. A token is an object that stores the value of a literal and its type;

  • token processor: builds an abstract syntax tree from tokens. The tree is built according to the precedence of the operations in the expression. The root of the tree is the last operation in the expression;

  • the interpreter itself: interprets the user's expression. If there are no errors in the expression, processes it. Otherwise, reports an error.

Implementation

Let's write an interpreter for mathematical expressions that will process user input and return an answer.

The interpreter will support:

  • Basic arithmetic operations: addition (+), subtraction (-), multiplication

  • division (/), raising to a power (^);

  • Performing operations taking into account priority;

  • Ability to use parentheses to specify the priority of operations;

  • Basic mathematical functions such as sin, cos, tan, log, sqrt and exp;

  • Variables: The user will be able to define variables, assign values ​​to them and use them in subsequent operations;

Error handling.

Lexical analysis of the expressionLet us describe the types of tokens that define the grammar of the expression language (src/enums.py

class TokenTypesEnum(str, Enum):
	NUMBER = 'number'
	PLUS = 'plus'
	MINUS = 'minus'
	STAR = 'star'
	SLASH = 'slash'
	CARET = 'caret'
	LEFT_PARENTHESIS = 'left_parenthesis'
	RIGHT_PARENTHESIS = 'right_parenthesis'
	EOF = 'EOF'

):Let's create a token class that will store the literal and its type (src/tokens.py

@dataclass
class Token:
	type: TokenTypesEnum
	literal: str

	def __str__(self) -> str:
    	return f'{self.__class__.__name__}({self.type}, {self.literal})'

): It is necessary to divide a mathematical expression into tokens. To do this, it is necessary to define grammar rules. We use for this purpose RegEx– regular expressions (src/config.py

LEXICAL_RULES: Dict[TokenTypesEnum, str] = {
	TokenTypesEnum.NUMBER: r'(\d+(\.\d+)?)',
	TokenTypesEnum.PLUS: r'(\+)',
	TokenTypesEnum.MINUS: r'(\-)',
	TokenTypesEnum.STAR: r'(\*)',
	TokenTypesEnum.SLASH: r'(/)',
	TokenTypesEnum.CARET: r'(\^)',
	TokenTypesEnum.LEFT_PARENTHESIS: r'(\()',
	TokenTypesEnum.RIGHT_PARENTHESIS: r'(\))'
}

):The mathematical expression needs to be converted into a set of tokens. To do this, we implement a lexical processor (src/lexical_processor.py

class LexicalProcessor(Processor):

	def __init__(self) -> None:
    	self._expression: str=""
    	self._results: List[Token] = []

	def process_expression(self, expression: str) -> List[Token]:
    	"""
    	Processes expression and returns the list of tokens generated from expression, if expression is valid.
    	"""

    	# Initializes for each new expression processing:
    	self._expression = expression
    	self._results = []

    	self._extract_regex_pattern_from_expression()

    	# Add a token symbolizing the end of the line for further operations on the preprocessed expression:
    	self._results.append(
        	Token(
            	literal="",
            	type=TokenTypesEnum.EOF
        	)
    	)

    	return self._results

	def _extract_regex_pattern_from_expression(self) -> None:
    	"""
    	Extracts the RegEx pattern from expression, starting from the beginning.
    	If one of RegEx patterns is found in the expression, the corresponding token will be created.
    	In other case, expression is not valid and ExpressionSyntaxError will be raised.

    	After token is created, the expression is reduced by the characters used for tokenization
    	and processed recursively.
    	"""

    	while len(self._expression) > 0:
        	max_rule: TokenTypesEnum = TokenTypesEnum.EOF
        	max_lit: str=""
        	self._expression = self._expression.strip()

        	# Finding the longest part of an expression using RegEx:
        	for rule in LEXICAL_RULES.keys():
            	regex_pattern: re.Pattern[str] = re.compile(pattern=LEXICAL_RULES[rule])
            	regex_match: Optional[re.Match[str]] = regex_pattern.match(string=self._expression)
            	if (regex_match is not None) and (len(regex_match.group(0)) > len(max_lit)):
                	max_lit = regex_match.group(0)
                	max_rule = rule

        	if max_rule == TokenTypesEnum.EOF:
            	raise ExpressionSyntaxError()

        	self._expression = self._expression[len(max_lit):]  # Reduce the expression by the processed part
        	self._results.append(
            	Token(
                	literal=max_lit,
                	type=max_rule
            	)
        	)

): Method self._extract_regex_pattern_from_expression

removes spaces from both sides of the expression. It then processes the first literal in the expression, determining its type using grammar rules. If the literal does not match any of the rules, the expression is invalid and an error is raised. If the expression is valid, a new token is created and the expression is truncated to the processed portion. The remainder of the expression is re-processed. After all literals in the expression are processed, a token with the type is createdTokenTypesEnum.EOF

. It signals the end of the expression and is needed for further processing of tokens.

Token processing

The purpose of token processing is to create an abstract syntax tree based on them. The tree is built according to the priorities in the mathematical expression.Let us introduce the classes of tree, expression and concrete types of possible expressions (src/expressions.py

dataclass
class TreeNode:
	"""
	Abstract Syntax Tree.
	"""

	pass


@dataclass
class Expression(TreeNode):
	pass


@dataclass
class UnaryOperation(Expression):
	"""
	-(2 + 3), where minus is operation and (2 + 3) is an expression.
	"""

	operation: str
	expression: Expression


@dataclass
class BinaryOperation(Expression):
	"""
	1 * 2 + 3 * 3, where "1 * 2" is left expression, "+" is operation and "3 * 3" is right expression.
	"""

	operation: str
	left: Expression
	right: Expression


@dataclass
class Number(Expression):
	value: float

):

The simplest expression is a number. A unary expression sets the sign for another expression. The most complex expression is a binary expression. A binary expression has child left and right expressions, as well as an operation to perform on them.We implement a token handler that builds an abstract syntax tree (src/tokens_parser.py

class TokensParser(Parser):
	"""
	Creates Abstract Syntax Tree according to operations priority in provided expression.

	program := computation
	computation := term ( (PLUS | MINUS) term )*
	term := unary ( (STAR | SLASH ) unary )*
	unary := PLUS unary | MINUS unary | exponentiation
	exponentiation := atom CARET unary | atom
	atom := LEFT_PARENTHESIS computation RIGHT_PARENTHESIS | number
	number := INT
	"""

	def __init__(self) -> None:
    	self._tokens: List[Token] = []
    	self._next_token_index: int = 0

	def parse(self, tokens: List[Token]) -> Expression:
    	"""
    	Parses the expression, created by user.
    	"""

    	# Initializes for each new parsing process:
    	self._next_token_index = 0
    	self._tokens = tokens

    	computation: Expression = self._parse_computation()
    	self._get_next_token(expected_token_type=TokenTypesEnum.EOF)
    	return computation

	def _parse_computation(self) -> Expression:
    	result: Expression = self._parse_term()
    	while (next_token_type := self._get_next_token_type()) in {TokenTypesEnum.PLUS, TokenTypesEnum.MINUS}:
        	operation: str = OPERATIONS[next_token_type]
        	self._get_next_token(expected_token_type=next_token_type)
        	right: Expression = self._parse_term()
        	result = BinaryOperation(operation=operation, left=result, right=right)

    	return result

	def _parse_term(self) -> Expression:
    	"""
    	Parses an expression with multiplications and divisions.
    	"""

    	result: Expression = self._parse_unary()
    	while (next_token_type := self._get_next_token_type()) in {TokenTypesEnum.STAR, TokenTypesEnum.SLASH}:
        	operation: str = OPERATIONS[next_token_type]
        	self._get_next_token(expected_token_type=next_token_type)
        	right: Expression = self._parse_unary()
        	result = BinaryOperation(operation=operation, left=result, right=right)

    	return result

	def _parse_unary(self) -> Expression:
    	"""
    	Parses a unary operator.
    	"""

    	if (next_token_type := self._get_next_token_type()) in {TokenTypesEnum.PLUS, TokenTypesEnum.MINUS}:
        	operation: str = OPERATIONS[next_token_type]
        	self._get_next_token(expected_token_type=next_token_type)
        	expression: Expression = self._parse_unary()
        	return UnaryOperation(operation=operation, expression=expression)
    	else:  # No unary operators in sight.
        	return self._parse_exponentiation()

	def _parse_exponentiation(self) -> Expression:
    	"""
    	Parses a caret operator.
    	"""

    	expression: Expression = self._parse_atom()
    	next_token_type: TokenTypesEnum = self._get_next_token_type()
    	if next_token_type == TokenTypesEnum.CARET:
        	self._get_next_token(expected_token_type=TokenTypesEnum.CARET)
        	right: Expression = self._parse_unary()
        	expression = BinaryOperation(operation=OPERATIONS[next_token_type], left=expression, right=right)

    	return expression

	def _parse_atom(self) -> Expression:
    	"""
    	Parses a parenthesised expression or a number.
    	"""

    	expression: Expression
    	if self._get_next_token_type() == TokenTypesEnum.LEFT_PARENTHESIS:
        	self._get_next_token(expected_token_type=TokenTypesEnum.LEFT_PARENTHESIS)
        	expression = self._parse_computation()
        	self._get_next_token(expected_token_type=TokenTypesEnum.RIGHT_PARENTHESIS)
    	else:
        	expression = self._parse_number()

    	return expression

	def _parse_number(self) -> Number:
    	return Number(float(self._get_next_token(expected_token_type=TokenTypesEnum.NUMBER).literal))

	def _get_next_token(self, expected_token_type: TokenTypesEnum) -> Token:
    	"""
    	Returns next token if token's type matches expected token type.

    	In other case raises error.
    	"""

    	next_token: Token = self._tokens[self._next_token_index]
    	self._next_token_index += 1

    	if next_token.type != expected_token_type:
        	raise ParseError(f'Expected {expected_token_type}, but received {next_token!r}.')

    	return next_token

	def _get_next_token_type(self) -> TokenTypesEnum:
    	"""
    	Checks type of upcoming token without consuming it.
    	"""

    	return self._tokens[self._next_token_index].type

):

Prioritizing operations and waiting for the next token type is the core of processing. If a different token is received than expected, an error will be raised.

InterpreterThe interpreter works with the lexical processor, token parser, and commands through abstractions. This allows the implementation of these objects to be changed without having to change the code of the interpreter itself ( src/interfaces.py Andsrc/commands/interfaces.py

class Parser(ABC):

	@abstractmethod
	def parse(self, tokens: List[Token]) -> Expression:
    	raise NotImplementedError


class Processor(ABC):

	@abstractmethod
	def process_expression(self, expression: str) -> List[Token]:
    	raise NotImplementedError

class BaseCommand(ABC):

	def __init__(self, a: float, b: float) -> None:
    	self._a: float = a
    	self._b: float = b

	@abstractmethod
	def execute(self) -> float:
    	raise NotImplementedError


class MathCommand(ABC):

	def __init__(self, value: float) -> None:
    	self._value: float = value

	@abstractmethod
	def execute(self) -> float:
    	raise NotImplementedError

): Each of the commands is represented by a separate object and is a simplified version of the pattern. Team( src/commands/base_commands.py Andsrc/commands/math_commands.py

class MultiplyCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a * self._b


class DivideCommand(BaseCommand):

	def execute(self) -> float:
    	try:
        	return self._a / self._b
    	except ZeroDivisionError:
        	raise CustomZeroDivisionError()


class SubtractCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a - self._b


class AddCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a + self._b


class ExponentialCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a ** self._b

class SqrtCommand(MathCommand):

	def execute(self) -> float:
    	return math.sqrt(self._value)


class SinCommand(MathCommand):

	def execute(self) -> float:
    	return math.sin(self._value)


class CosCommand(MathCommand):

	def execute(self) -> float:
    	return math.cos(self._value)


class LogCommand(MathCommand):

	def execute(self) -> float:
    	return math.log(self._value)


class TanCommand(MathCommand):

	def execute(self) -> float:
    	return math.tan(self._value)


class ExpCommand(MathCommand):

	def execute(self) -> float:
    	return math.exp(self._value)

):Let's associate commands with symbols that represent them in a mathematical expression (src/config.py


BASE_COMMANDS: Dict[str, Type[BaseCommand]] = {
	'+': AddCommand,
	'-': SubtractCommand,
	'*': MultiplyCommand,
	'/': DivideCommand,
	'^': ExponentialCommand,
}

MATH_COMMANDS: Dict[str, Type[MathCommand]] = {
	'sin': SinCommand,
	'cos': CosCommand,
	'tan': TanCommand,
	'log': LogCommand,
	'exp': ExpCommand,
	'sqrt': SqrtCommand,
}

):

  1. The work of the mathematical expression interpreter is implemented through the following steps:

  2. Get user input;

  3. Ensure that user input is valid or notify the user of an error. Valid user input contains a variable, an assignment symbol, and an expression to assign. The variable name consists of only letters;

  4. Replace all variables in the expression with their value. This is possible if the variables exist in the storage and were previously calculated based on other user expressions;

  5. Calculate the values ​​of mathematical functions and replace them in the expression with the obtained value. The argument for the mathematical function is calculated in steps #5, #6, and #7;

  6. Transform an expression into a set of tokens using a lexical processor;

  7. Build a syntax tree from tokens using a token handler;

Recursively in ascending order evaluate the value of each tree node to obtain the final value of the expression.We implement an interpreter class that performs all the listed steps in the required order (src/interpreter.py

class MathOperationsInterpreter:

	def __init__(
        	self,
        	interpreter_base_commands: Dict[str, Type[BaseCommand]],
        	interpreter_math_commands: Dict[str, Type[MathCommand]],
        	parser: Parser,
        	lexical_processor: Processor
	) -> None:

    	self._base_commands: Dict[str, Type[BaseCommand]] = interpreter_base_commands
    	self._math_commands: Dict[str, Type[MathCommand]] = interpreter_math_commands
    	self._parser: Parser = parser
    	self._lexical_processor: Processor = lexical_processor

    	# Storage for executed expressions, which can be user in future expressions:
    	self._user_variables: Dict[str, float] = {}

	def interpret(self, user_input: str) -> None:
    	"""
    	1) Receives user input and checks it validity;
    	2) Interprets the expression in the user input and executes it;
    	3) Assigns executed result of the expression to a given variable.
    	"""

    	try:
        	key: str
        	expression: str
        	key, expression = self._validate_user_input(user_input=user_input.lower())

        	expression_result: float = self._execute(expression=expression)
        	self._user_variables[key] = expression_result
    	except (
            	ParseError,
            	ExpressionSyntaxError,
            	IncorrectVariableAssignmentError,
            	UnknownExpressionTypeError,
            	CustomZeroDivisionError
    	) as e:
        	print(e)

	def _validate_user_input(self, user_input: str) -> Tuple[str, str]:
    	"""
    	Validates user input. If input is invalid, raises IncorrectVariableAssignmentError.

    	User input must contain a variable, an assignment sign, and an assignment expression.
    	Variable must contain only alphabetic characters.

    	Examples of correct user input are:
    	1) "x = 2";
    	2) "x= 2";
    	3) "x=2";
    	"""

    	user_input_sep: str=" "
    	values: List[str] = user_input.split(user_input_sep)

    	# Check, if input is "variable = expression":
    	user_variable: str
    	expression: str
    	equal_sign: str="="
    	if values[0].endswith(equal_sign):
        	user_variable = values[0][: -1]  # Variable name without "=" symbol
        	expression = user_input_sep.join(values[1:])
    	elif len(values) == 1 and equal_sign in values[0]:
        	use_var_index: int = values[0].find(equal_sign)
        	user_variable = values[0][: use_var_index]
        	expression = values[0][use_var_index + 1:]
    	elif values[1] == equal_sign:
        	user_variable = values[0]
        	expression = user_input_sep.join(values[2:])
    	else:
        	raise IncorrectVariableAssignmentError()

    	if not user_variable.isalpha():
        	raise IncorrectVariableAssignmentError()

    	expression = self._substitute_user_variables(expression=expression)
    	return user_variable, expression

	def _substitute_user_variables(self, expression: str) -> str:
    	"""
    	Substitutes already interpreted variable in expression if exists.
    	"""

    	for key in self._user_variables.keys():
        	if key in expression:
            	expression = expression.replace(key, str(self._user_variables[key]))

    	return expression

	def _execute(self, expression: str) -> float:
    	"""
    	1) All basic mathematical functions, such as sin, cos, tan, log, sqrt and exp,
    	are executed if any in the expression;
    	2) Generates tokens from the expression for parsing purpose;
    	3) Creates AST (Abstract Syntax Tree) on tokens basis;
    	4) Recursively calculates the value of a parsed expression based on the AST.
    	"""

    	expression = self._execute_math_operations(expression=expression)
    	tokens: List[Token] = self._lexical_processor.process_expression(expression=expression)

    	try:
        	operations_tree: TreeNode = self._parser.parse(tokens=tokens)
    	except ParseError:
        	raise ExpressionSyntaxError()

    	return self._calculate_node_value(node=operations_tree)

	def _execute_math_operations(self, expression: str) -> str:
    	"""
    	Searches for a math function in expression.
    	If such function is found, calculates its value and replaces the subexpression related to the
    	math function with the calculated value.

    	Example:
    	:param expression: "sqrt(25) + 5 * 2"
    	:return: "5.0 + 5 * 2"
    	"""

    	for math_command in self._math_commands.keys():
        	math_command_index: int = expression.find(math_command)
        	if math_command_index != -1:  # Not found
            	math_command_expression: str
            	postfix: str
            	math_command_expression, postfix = self._extract_expression_from_parentless(
                	expression=expression[math_command_index + len(math_command):]
            	)

            	command = self._math_commands[math_command](
                	value=self._execute(
                    	expression=math_command_expression
                	)
            	)

            	prefix: str = expression[: math_command_index]
            	expression = prefix + str(command.execute()) + postfix

    	return expression

	@staticmethod
	def _extract_expression_from_parentless(expression: str) -> Tuple[str, str]:
    	"""
    	Extracts subexpression from parentless in original expression for math functions purposes.
    	If original expression is not valid or there is no parentless in it, raises ExpressionSyntaxError.
    	Returns extracted subexpression and the remaining part of original expression.

    	Example:
    	:param expression: "(5 + 2) * 3"
    	:return: ("5 + 2", " * 3")
    	"""

    	opening_parenthesis: str="("
    	closing_parenthesis: str=")"
    	if not expression.startswith(opening_parenthesis):
        	raise ExpressionSyntaxError()

    	# Starting from -1, because it's index and will be straightaway incremented
    	# at first iteration of for loop below:
    	parentless_expression_end_index: int = -1

    	parentless_stack: List[str] = []
    	for index in range(len(expression)):
        	parentless_expression_end_index += 1
        	symbol: str = expression[index]
        	if symbol == opening_parenthesis:
            	parentless_stack.append(symbol)
        	elif symbol == closing_parenthesis:
            	parentless_stack.pop(0)

        	if len(parentless_stack) == 0:
            	break

    	if len(parentless_stack) != 0:
        	raise ExpressionSyntaxError()

    	# Expression and postfix should be without parentless from original expression:
    	parentless_expression: str = expression[1: parentless_expression_end_index]
    	postfix: str = expression[parentless_expression_end_index + 1:]
    	return parentless_expression, postfix

	def _calculate_node_value(self, node: TreeNode) -> float:
    	"""
    	1) Gets an AST tree node, which is one of next types: UnaryOperation, BinaryOperation or Number;
    	2) If node is a Number type - returns it, else recursively calculates expression, stored in node.
    	"""

    	command: BaseCommand
    	if isinstance(node, UnaryOperation):
        	command = self._base_commands[node.operation](
            	a=0,  # Unary operation has only one part of expression
            	b=self._calculate_node_value(node=node.expression)
        	)

        	return command.execute()
    	elif isinstance(node, BinaryOperation):
        	command = self._base_commands[node.operation](
            	a=self._calculate_node_value(node=node.left),
            	b=self._calculate_node_value(node=node.right)
        	)

        	return command.execute()
    	elif isinstance(node, Number):
        	return node.value
    	else:
        	raise UnknownExpressionTypeError()

	def get_result(self) -> Optional[float]:
    	"""
    	Returns the value for a result variable if it exists.

    	This variable signals that all expressions have been interpreted and variable storage is no longer required.
    	"""

    	result: Optional[float] = self._user_variables.get(RESULT_VARIABLE)
    	if result:
        	self._user_variables.clear()

    	return result

): To get the result of an expression using all variables – the final expression must be assigned to a variableresult

.
See you in the next episodes.

I would appreciate your support and constructive criticism!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *