Marshalling yard algorithm

Hello everyone! In this article we continue and finish writing a console engineering calculator. To understand what is happening, I strongly recommend that you first read the first part.

Introduction

In the last part we learned how to break down the original mathematical expression of the format (log2(18)/3.14)*sqrt(0.11^(-3)/0.02)for tokens. The output is an array of tokens, each of which contains information about the type (operator, bracket, number, …) and associativity, if it has one.

Now we want to convert the expression to reverse Polish notation (RPN) so that we can then conveniently calculate it. The marshalling yard algorithm invented by Edsger Dijkstra allows us to do this.

Marshalling yard algorithm

Description

Without further ado and without reinventing the wheel, I will say that a detailed description of the algorithm’s operation is Russian and on English is on Wikipedia. The English version also clearly shows why the algorithm is called that way – its work is reminiscent of the work of a railway marshalling yard.

Complexity O(n) which is very, very pleasing.

Algorithm

Let’s start implementation. Let me remind you that the tokenizer and the class Token – in the first part.

Implementation

Function shuntingYard takes a vector of tokens as input; the output also produces a vector of tokens, the so-called output queue, but in RPN,

void shuntingYard(const std::vector<Token> &expr, std::vector<Token> &outQueue)
{ // ...

The basis of the algorithm is the stack, into which operators, functions, parentheses and delimiters are placed, and then transferred to the output queue. This operation is performed frequently, so for convenience we use a lambda that moves the token from the top of the stack to the output queue.

// ...
std::stack<Token> stack;
auto fromStackToQueue = [&]() { outQueue.push_back(stack.top()); stack.pop(); };
// ...

Now we go through all the tokens. If the token is a numeric literal, we immediately transfer it to the output queue; if a function or closing parenthesis – on the stack.

// ...
for(const auto& token : expr)
    {
        switch(token.getType())
        {
        case Token::INT_LITERAL:
        case Token::FLOAT_LITERAL:
            outQueue.push_back(token);
            break;
        case Token::L_PARANTHESIS:
        case Token::FUNCTION:
            stack.push(token);
            break;
        // ...

If the token is an operator:

// ...
case Token::OPERATOR:
    if(!stack.empty())
    {
        /* Пока на вершине стека присутствует правоассоциативный оператор с приоритетом выше,
           или левоассоциативный с равным приоритетом, перекладываем его в очередь вывода   */
        while(stack.top().getType() == Token::OPERATOR && ((stack.top().getPrecendance() > token.getPrecendance())
                || (stack.top().getPrecendance() == token.getPrecendance() && token.getAsc() == Token::LEFT)))
        {
            fromStackToQueue();
            if(stack.empty()) 
                break;
        }
    }
    // Перекладываем текущий токен на вершину стека
    stack.push(token);
    break;
// ...

If the token is a closing parenthesis:

// ...
case Token::R_PARANTHESIS:
    // Если стек уже пуст, даже не начинаем цикл
    if(stack.empty())
        throw Error("Non-balanced on paranthesis expression!", Error::Syntax);

    // Пока не встретили открывающую скобку, перекладываем токен с вершины стека в очередь вывода
    while (stack.top().getType() != Token::L_PARANTHESIS)
    {
        fromStackToQueue();
        // Если стек кончился, а открывающую скобку не встретили - в исходном выражении не хватает скобок
        if (stack.empty())
            throw Error("Non-balanced on paranthesis expression!", Error::Syntax);
    }

    // Удаляем открывающую скобку со стека, не перенося её в очередь вывода
    stack.pop();

    // Если на вершине оказалась функция, перекладываем её а стек
    if(!stack.empty() && stack.top().getType() == Token::FUNCTION)
        fromStackToQueue();
    break;
    // ...

If the token is a function argument separator (aka comma):

// ...
case Token::SEPARATOR:
    // Если стек уже пуст, даже не начинаем цикл
    if(stack.empty())
        throw Error("Paranthesis or separator missed!", Error::Syntax);

    // Пока не наткнулись на открывающую скобку, перекладываем токены со стека в очередь вывода
    while(stack.top().getType() != Token::L_PARANTHESIS)
    {
        fromStackToQueue();
        // Если так и не встретили закрывающую скобку - в исходном выражении не хватает скобки или разделителя
        if(stack.empty())
            throw Error("Paranthesis or separator missed!", Error::Syntax);
    }
    break;
} // конец цикла
// ...

Now we move the operators remaining on the stack to the output queue:

// ...
while(!stack.empty())
{
    if(stack.top().getType() == Token::L_PARANTHESIS)
        throw Error("Paranthesis-unbalanced expression!", Error::Syntax);
    else
        fromStackToQueue();
} // конец функции shuntingYard

Great, we’ve converted the expression.

For example, from(log2(18)/3.14)*sqrt(0.11^(-3)/0.02) we get the sequence23 log2 2 - 3.14 / 0.1 10 3 - ^ * 0.02 / sqrt * +

RPN calculation

Now that’s why we’ve all gathered here – to count. As detailed Here, the whole point of RPN is that we no longer have to worry about the precedence of operations – all the operands and operators are already in the right order. So all we have to do is go through the operators sequentially, writing intermediate calculations onto the stack.

Algorithm
  1. Processing the input symbol

    • If an operand is given as input, it is pushed onto the top of the stack.

    • If an operation sign is given as input, then the corresponding operation is performed on the required number of values ​​taken from the stack, taken in the order of addition. The result of the performed operation is placed on the top of the stack.

  2. If the input character set is not fully processed, go to step 1.

  3. After the input character set has been completely processed, the result of the expression evaluation lies on the top of the stack.

Implementation

Function countRPN We will take as input a vector of tokens in RPN (oh well!) and return the result of the calculation.

double countRPN(const std::vector<Token> &expr)
{ // ...

Now we need a stack where we will put the operands and the results of intermediate calculations:

// ...
std::stack<double> stack;
// ...

For convenience, we introduce lambdas that will take one or two arguments from the stack:


// ...
auto getOneToken = [&]() 
{
    if(stack.empty()) throw Error("Not enough arguments in function!", Error::Syntax);
    double x = stack.top(); 
    stack.pop(); 
    return x; 
};

auto getTwoTokens = [&]()
    { double x = getOneToken(), y = getOneToken(); return std::tuple{y,x}; };
// ...

Now, actually, do the math:

// ...
double res;
for (auto &token : expr)
{
    const std::string &str = token.getStr();
    switch(token.getType())
    {
    // Если токен - числовой литерал, он же операнд, кладем на стек
    case Token::INT_LITERAL:
        stack.push(std::stof(str));
        break;
    case Token::FLOAT_LITERAL:
        stack.push(std::stof(str));
        break;
// ...

Now the operators:

// ...
case Token::OPERATOR:
    switch(token.getAsc())
    {
    // Левоассоциативные операторы
    case Token::LEFT:
    {
        auto [a,b] = getTwoTokens(); 
        if      (str == "+") res = a + b;
        else if (str == "-") res = a - b;
        else if (str == "*") res = a * b;
        else if (str == "/") res = checkedDivision(a, b);
        else if (str == "^") res = std::pow(a,b);
        else    throw Error("Unknown operator!", Error::Syntax);
        break;
    }
    // Правоассоциативные операторы
    case Token::RIGHT:
    {
        auto a = getOneToken();
        if   (str == "-") res = -a;
        else throw Error("Unknown operator!", Error::Syntax);
        break;
    }
    }
    stack.push(res);
    break;
// ...

And functions:

// ...
    case Token::FUNCTION:
        if(str == "log") 
        {
            auto [a,b] = getTwoTokens();
            if(a <= 0.f || a == 1.0f) throw Error(std::format("log(a,x): not defined for a = {}", a), Error::Math);
            if(b <= 0.f) throw Error("log(a,x): out of function's domain", Error::Math);
            res = std::log(b) / std::log(a);
        }
        else if 
        // сколько хотите, столько и добавляйте
        else
            throw Error("Unknown function!", Error::Syntax);
        stack.push(res);
        break;
    }
} // конец цикла
// ...

The result will be the only remaining element on the stack:

// ...
  return stack.top();
} // конец функции countRPN

Conclusion

All project code is in github. I hope it was interesting.

UPD

As noted in the comments to the first part, there is a much simpler way to do what was described in the article – the GNU Bison parser generator, developed back in 1985. The official documentation has example creating an RPN calculator.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *