Generating AST in Rust

AST generation process

So, an abstract syntax tree is a data structure that describes the syntactic structure of the source code as a tree. The tree nodes represent syntactic constructs: operators, variables, expressions, etc.

In compilers, the AST serves an important purpose: it is an intermediate representation of the program that can be easily processed, analyzed, and transformed by other stages of the compilation, such as optimization and code generation.

Defining AST Structure in Rust

First, you need to create a data structure that will represent the nodes of the tree. In Rust, it is most convenient to use enum to represent different types of tree nodes:

#[derive(Debug, Clone)]
enum Expr {
    Number(i32),
    Variable(String),
    Add(Box<Expr>, Box<Expr>),
    Subtract(Box<Expr>, Box<Expr>),
    Multiply(Box<Expr>, Box<Expr>),
    Divide(Box<Expr>, Box<Expr>),
    FunctionCall(String, Vec<Expr>),
}

Here Expr — is an enumeration that describes the tree nodes for expressions. We use Box<Expr> for recursive structures such as arithmetic operations, to avoid problems with determining the size of structures at compile time. Why is it used Box? In Rust, for recursive structures, the size of the type must be known at compile time. Box points to data on the heap and has a fixed size.

Creating a simple parser

Now let's create a simple parser that will process strings with arithmetic expressions, variables and function calls and transform them into our tree. For this, we will use the library nom. This is a powerful parser combinator library that allows you to build parsers from small components. Parser combinators are functions that take a string as input and return the parsing result or an error. They can be combined to build complex parsers.

Let's add a dependency to the project:

[dependencies]
nom = "7.0"

Now let's start writing the parser:

use nom::{
    IResult,
    character::complete::{alpha1, digit1, char},
    branch::alt,
    combinator::map_res,
    sequence::tuple,
    multi::fold_many0,
    multi::separated_list0,
    bytes::complete::tag,
};

fn parse_number(input: &str) -> IResult<&str, Expr> {
    let (input, num) = map_res(digit1, |s: &str| s.parse::<i32>())(input)?;
    Ok((input, Expr::Number(num)))
}

fn parse_variable(input: &str) -> IResult<&str, Expr> {
    let (input, var) = alpha1(input)?; // Переменные начинаются с букв
    Ok((input, Expr::Variable(var.to_string())))
}

fn parse_function_call(input: &str) -> IResult<&str, Expr> {
    let (input, (func_name, _, args, _)) = tuple((
        alpha1,          // имя функции
        char('('),       // открывающая скобка
        separated_list0(char(','), parse_expr), // аргументы функции
        char(')')        // закрывающая скобка
    ))(input)?;
    Ok((input, Expr::FunctionCall(func_name.to_string(), args)))
}

fn parse_factor(input: &str) -> IResult<&str, Expr> {
    alt((
        parse_number,
        parse_variable,
        parse_function_call,
    ))(input)
}

fn parse_term(input: &str) -> IResult<&str, Expr> {
    let (input, initial) = parse_factor(input)?;

    fold_many0(
        tuple((alt((tag("*"), tag("/"))), parse_factor)),
        initial,
        |acc, (op, val)| match op {
            "*" => Expr::Multiply(Box::new(acc), Box::new(val)),
            "/" => Expr::Divide(Box::new(acc), Box::new(val)),
            _ => unreachable!(),
        },
    )(input)
}

fn parse_expr(input: &str) -> IResult<&str, Expr> {
    let (input, initial) = parse_term(input)?;

    fold_many0(
        tuple((alt((tag("+"), tag("-"))), parse_term)),
        initial,
        |acc, (op, val)| match op {
            "+" => Expr::Add(Box::new(acc), Box::new(val)),
            "-" => Expr::Subtract(Box::new(acc), Box::new(val)),
            _ => unreachable!(),
        },
    )(input)
}

What we did:

  • Added parse_variable for parsing variables consisting of letters.

  • Added parse_function_callto parse function calls with arguments.

  • Function parse_factor can now parse numbers, variables and function calls.

Parsing can always fail, especially when the input is invalid, so add error handling:

fn parse_expr_with_errors(input: &str) -> Result<Expr, String> {
    match parse_expr(input) {
        Ok((_, expr)) => Ok(expr),
        Err(nom::Err::Error(e)) | Err(nom::Err::Failure(e)) => Err(format!("Parsing error: {:?}", e)),
        Err(nom::Err::Incomplete(_)) => Err("Input is incomplete".to_string()),
    }
}

Now the parser processes errors and displays them to the user.

AST visualization

Now let's visualize the AST to clearly show the tree structure. We've already implemented the function pretty_printwhich displays the tree on the screen:

impl Expr {
    fn pretty_print(&self, indent: usize) {
        let padding = " ".repeat(indent);
        match self {
            Expr::Number(n) => println!("{}Number({})", padding, n),
            Expr::Variable(v) => println!("{}Variable({})", padding, v),
            Expr::FunctionCall(name, args) => {
                println!("{}FunctionCall: {}", padding, name);
                for arg in args {
                    arg.pretty_print(indent + 2);
                }
            }
            Expr::Add(lhs, rhs) => {
                println!("{}Add:", padding);
                lhs.pretty_print(indent + 2);
                rhs.pretty_print(indent + 2);
            }
            Expr::Subtract(lhs, rhs) => {
                println!("{}Subtract:", padding);
                lhs.pretty_print(indent + 2);
                rhs.pretty_print(indent + 2);
            }
            Expr::Multiply(lhs, rhs) => {
                println!("{}Multiply:", padding);
                lhs.pretty_print(indent + 2);
                rhs.pretty_print(indent + 2);
            }
            Expr::Divide(lhs, rhs) => {
                println!("{}Divide:", padding);
                lhs.pretty_print(indent + 2);
                rhs.pretty_print(indent + 2);
            }
        }
    }
}

Conclusion:

Add:
  Number(5)
  Multiply:
    Variable(x)
    Number(3)

This conclusion demonstrates the following:

  • The root node is an addition operation Add.

  • The left operand is the number 5 Number(5)it is displayed with an indent of 2 spaces.

  • The right operand is the multiplication operation. Multiplywhich in turn consists of a variable x and numbers 3.

    • Variable x is output as Variable(x) with an indent of 4 spaces.

    • Number 3 is output as Number(3) with the same 4-space indent, since this is the right side of the multiplication operation.

This output shows the AST structure corresponding to the expression 5 + (x * 3).

Optimization: Constant folding and dead code elimination

At compile time, you can perform optimizations such as constant folding and dead code elimination:

impl Expr {
    fn optimize(&self) -> Expr {
        match self {
            Expr::Add(lhs, rhs) => {
                let left = lhs.optimize();
                let right = rhs.optimize();
                if let Expr::Number(l) = left {
                    if let Expr::Number(r) = right {
                        return Expr::Number(l + r); // Оптимизация констант
                    }
                }
                if let Expr::Number(0) = right {
                    return left; // x + 0 => x
                }
                if let Expr::Number(0) = left {
                    return right; // 0 + x => x
                }
                Expr::Add(Box::new(left

), Box::new(right))
            }
            Expr::Multiply(lhs, rhs) => {
                let left = lhs.optimize();
                let right = rhs.optimize();
                if let Expr::Number(l) = left {
                    if let Expr::Number(r) = right {
                        return Expr::Number(l * r); // Оптимизация констант
                    }
                }
                Expr::Multiply(Box::new(left), Box::new(right))
            }
            Expr::FunctionCall(_, _) => self.clone(), // В будущем можно добавить интерпретацию вызовов функций
            _ => self.clone(),
        }
    }
}

Function optimize automatically simplifies expressions and calculates constants at compile time.

Conclusion

AST makes the process of working with code more convenient by providing a clear structure for further manipulations and optimizations.


Archi vs. Everyone: How a Free Architectural Tool Wins the Hearts of Millions? We'll discuss this at an open lesson on September 23. During the lesson, we'll:

  • Let's consider the criteria for choosing architectural tools in a company

  • Let's review the leading global and Russian instruments

  • Let's touch on the main features of the free Archi tool

  • Let's model a simple case

You can sign up for a lesson on the course page “Archimate. Basic”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *