getting rid of variables

How complex programs can you write in Python without using variables at all (as well as function arguments), with the exception of a couple of global arrays? Correct answer: yes, of any complexity. If something can be done in assembler, then even more so in Python! True, it’s better to let a machine generate the code instead of us 🙂

We continue the conversation about a minimalistic compiler, which is quite possible to write in a weekend. The task is to translate code from the language I invented into x86 assembler. My compiler consists of 611 lines of code, and does not have a single dependency:

ssloy@khronos:~/tinycompiler$ cat *.py|wc -l
611

Despite the fact that the target language is assembly, I am not a masochist, and I came to this gradually. First, I translated the code into Python, and gradually cut down the functionality of the target language until I was left with bare assembler. The topic of today’s conversation: generating code in Python without using variables.

Table of contents

  1. Syntax trees and a naive Python translator

  2. Lexer/parser

    2′. Cursed fire, or the magic of the C preprocessor

  3. Symbol Tables: Variable Scope and Type Checking

  4. Stack and translator to Python without using Python variables (this article)

  5. Translator to assembler

  6. Raytracing 🙂

  7. Getting rid of dependencies: writing a lexer and parser ourselves

  8. Optimizing compiler?

  9. Garbage collector?

GFX!

My language is extremely primitive, but at the same time it allows you to create fully-fledged programs; here is the result of several programs that I wrote specifically for this course:

Code for these examples lies here. I have already described in detail the program for drawing fire (and at the same time programming using only a lexer), other examples are in line.

I know that I have a professional bias towards computer graphics, but I can’t stand examples like calculating Fibonacci numbers and all sorts of factorials (literally the next example in this article is a factorial :)). I want my students to be able to test their compilers against something pleasing to the eye.

Recursion for the little ones

The factorial of a positive integer n is the product of all positive integers less than or equal to n. For example, the factorial of 7 is 7*6*5*4*3*2*1, which is equal to 5040. The factorial of n is written as n!, and the problem of calculating it is obviously reduced to calculating (n-1)!, namely , n! = n * (n – 1)!

The recursive implementation suggests itself:

ssloy@khronos:~$ python3 <<<'
def factorial(n):
    result = 1
    if n > 1:
        result = n * factorial(n-1)
    return result

print(factorial(7))
'
5040

An introduction to programming often begins with code like this, but usually leaves out the explanation of how the machine actually executes it. The main tool in such code is a function call, but how does it work?

A typical (not the only possible, but a typical) mechanism for implementing a function call is based on saving arguments and local variables of the function on the stack and looks like this:

  1. At the call point, the parameters passed to the function are placed on the stack (+ usually the number of the instruction to return to after the function completes).

  2. The called function puts its own local variables on the stack during operation.

  3. Upon completion of the calculation, the function clears the stack of its local variables and writes the result (usually to one of the processor registers).

  4. The return instruction from a function reads the return address from the stack and branches to that address. Either immediately before or immediately after the function returns, the stack is cleared of the function’s arguments.

It is easy to see that the need to expand the stack is dictated by the requirement to restore the state of the calling function instance (that is, its parameters, local data, and return address) after the called function completes.

Thus, with each recursive call to a function, a new set of its parameters and local variables is created, which, together with the return address, is placed on the stack. In order not to be unfounded and not to theorize too much, let’s add one line of introspection to our code with the factorial:

ssloy@khronos:~$ python3 <<<'
import inspect

def factorial(n):
    result = 1
    # вот эту строчку я добавил:
    print("instances of n:", [frame[0].f_locals["n"] for frame in reversed(inspect.stack()[:-1])])
    if n > 1:
        result = n * factorial(n-1)
    return result

print(factorial(7))
'
instances of n: [7]
instances of n: [7, 6]
instances of n: [7, 6, 5]
instances of n: [7, 6, 5, 4]
instances of n: [7, 6, 5, 4, 3]
instances of n: [7, 6, 5, 4, 3, 2]
instances of n: [7, 6, 5, 4, 3, 2, 1]
5040

On every call factorial(n) The line I added goes through the stack and displays all instances of the variable nwhat it finds. To calculate 7! function factorial(n) is called 7 times, and it is easy to see that at some point the machine stores all 7 instances of the variable. Of course, for the variable result exactly the same picture.

The main idea of ​​this article

Python does a lot of work for us, but this does not mean that we cannot do without its help. The assembler does not give you the luxury of automatically passing arguments; you have to work with the stack manually. I propose to still use Python as the target language, but reduce the capabilities used to the assembler level: in fact, we will write in assembly language, but with Python syntax. This approach has two advantages:

  1. We are not required to strictly follow all restrictions at once. For example, I can limit the ability to pass arguments to functions, but I don’t have to jump straight into the four-register world. I can easily use Python’s expression parser, writing a comfortable 5//-3, if necessary, instead of the kilometer-long footcloth of initializing all the necessary registers with the required flags. This way, I can have a working compiler at any time and finish it piece by piece.

  2. In addition, we do not have to jump headlong into a world that is not very friendly for beginners, in which often even primitive print() does not work, and how debuggers work is a separate story. We can use our favorite IDE, set breakpoints, display anything on the screen, and comfortably change the generated code with our hands to understand what went wrong in it.

We are not required to use the built-in Python stack; we can emulate its operation using a custom stack. Let’s create a global array stack, and we will store arguments and local variables of functions in it. Then the calculation of the factorial can be rewritten (by hand!) as follows:

ssloy@khronos:~$ python3 <<<'
def factorial():                             # n is stack[-1]
    stack.append(1)
    if stack[-2] > 1:                        # n is stack[-2], result is stack[-1]
        stack.append(stack[-2]-1)
        stack[-2] = stack[-3] * factorial()  # n is stack[-3], result is stack[-2]
        stack.pop()
    return stack.pop()

stack = []
stack.append(7)
print(factorial())
stack.pop()
'
5040

In this code, we no longer use Python variables at all, with the exception of a single array stack. This program is very similar to what I want to achieve automatically with my compiler, and the transition from the previous one to this one is the main point of the article. Put down your cup of tea, look at the code carefully, if necessary, draw a stack on a piece of paper.

I wrote the code by hand, and I don’t really like that I had to manually keep track of the position of variables on the stack. To refer to the same variable nI was forced to call three different expressions stack[-1], stack[-2] And stack[-3]. This is not very good, we need to find a more oaky way. It’s time to remember about the symbol tables from the last joke.

Guinea pig

Let’s put factorial aside and look at a simple non-trivial example with several different function calls. On the left you see the source code in wend, on the right it is translated into Python using the version v0.0.3 my compiler. This is exactly what was described in the previous article about symbol tables. The current task is to rewrite the right hand side without using variables, much the same as I did with the factorial.

If suddenly the code in the picture is too confusing for your eyes, let me give it in plain text:

wend
fun main() {
    fun sopfr(n:int):int {
        var div:int;
        fun sopfr_aux(n:int):int {
            var rec:int;
            rec = 0;
            if n % div == 0 {
                rec = div;
                if n!=div {
                    rec = rec + sopfr_aux(n/div);
                }
            } else {
                div = div + 1;
                rec = sopfr_aux(n);
            }
            return rec;
        }

        div = 2;
        return sopfr_aux(n);
    }

    println sopfr(42);
}
python
def main():
    def sopfr(n):
        def sopfr_aux(n):
            nonlocal div
            rec = 0
            if n % div == 0:
                rec = div
                if n != div:
                    rec = rec + sopfr_aux(n // div)
            else:
                div = div + 1
                rec = sopfr_aux(n)
            return rec

        div = 2
        return sopfr_aux(n)

    print(sopfr(42))

main()

This program calculates the sum of its prime factors for a given number. For example, 20 = 5*2*2, so sopfr(20) = 2 + 2 + 5 = 9. This function is often called integer logarithm, which is not surprising, since it is quite obvious that sopfr(a * b) = sopfr(a) + sopfr(b). Mathematicians are quite actively studying the properties of this function, but this is somewhat beyond the scope of our discussion 🙂

There are several things that interest me in this code:

  1. We have three nested scopes.

  2. We have functions with different numbers of arguments and different numbers of local functions.

  3. We have an appeal to non-local variable div from function sopfr_aux.

The latter is of particular interest. Let’s remember that in the last example with factorial, I had to manually track the position of local variables on the stack, the size of which was changing. This is unpleasant, but not at all scary, since all stack changes within one function are known at compile time: first I access the variable n How stack[-1]and after adding a local variable result the stack grew, and n became the penultimate element stack[-2]. What to do when the stack changes in runtime? But this happens extremely regularly.

Exercise 1

Let’s return to our guinea pig of calculation. sopfr(n). Let’s put a breakpoint on the line rec = div:

42 is factored into factors 2, 3, 7, that is, the interpreter must go through this line three times. Take a piece of paper and a pencil, and draw the state of the stack each of the three times (I’m only interested in variables, all sorts of addresses are useless).

Let me help you with the first breakpoint. Function main() puts 42 on the stack, and this corresponds to the variable n in function sopfr(n). She, in turn, puts 2 on the stack (variable div), and then puts 42 again (function argument sopfr_aux(n)). well and sopfr_aux creates a local variable recputting 0 on the stack, and hits the breakpoint because 42 is divisible by 2. So, when we hit the breakpoint the first time, the stack will look like this: [42,2,42,0]. Here’s a short animation of the process:

Now look up from the article and try to draw a stack for the second and third breakpoints. The task is trivial, but I’m willing to bet that many will get it wrong. If anyone made a mistake, don’t be shy, write in the comments 🙂

Answer

Don’t be surprised, this is a completely normal answer, so I added just a little introspection to the code to check the result:

Verification code
ssloy@khronos:~$ python3 <<<'
import inspect

def main():
    def sopfr(n):
        def sopfr_aux(n):
            nonlocal div
            rec = 0
            if n % div == 0:
                # breakpoint, print stack
                var = {"main":[], "sopfr":["n","div"], "sopfr_aux":["n","rec"]}
                print([frame[0].f_locals[name] for frame in reversed(inspect.stack()[:-1]) for name in var[frame[3]]])
                rec = div
                if n != div:
                    rec = rec + sopfr_aux(n // div)
            else:
                div = div + 1
                rec = sopfr_aux(n)
            return rec

        div = 2
        return sopfr_aux(n)

    print(sopfr(42))

main()
'
[42, 2, 42, 0]
[42, 3, 42, 2, 21, 0, 21, 0]
[42, 7, 42, 2, 21, 0, 21, 3, 7, 0, 7, 0, 7, 0, 7, 0, 7, 0]
12

Symbol tables to the rescue

When I rewrote the factorial in assembly Python, I accessed the variables simply by the shift from the top of the stack, and even for local variables this shift was different, but at least it was known at compile time. But here everything is very different. Once we reach our breakpoint, we’ll want to write div V rec: one variable is local, and the second is non-local, and it is buried very deep on the stack; however, the depth is unknown at compile time. On the other hand… Let’s see what the symbol table for our program looks like:

We have three nested variable scopes: main, sopfr, sopfr_aux.

In function main there are no variables in sopfr there are two integer variables n And divand in sopfr_aux there are two integer variables n And rec.

Until now, my compiler referred to variables by their names, but now it’s time to number them and forget about identifiers. Here is the commit diff, in which I simply added scope and variable counters within each scope. In fact, I added these purple numbers to the symbol table:

Now whenever I want to access a variable divI don’t need its identifier, it’s enough for me to know that it’s a variable with index 1 in scope 1, that is, a couple of numbers are enough for identification.

Now we have everything we need to get rid of variables and sopfr, which is much more complex than factorial. Let me remind you that I still write code by hand, I need this in order to understand how my compiler will work.

Here is the code with a picture:

And he is the same in text
ssloy@khronos:~$ python3 <<<'
def main():
    def sopfr():
        def sopfr_aux():
            global display,stack,eax
            stack += [None]*1        # allocate memory for one local variable rec
            stack += [display[2]]    # save old frame pointer
            display[2] = len(stack)-3 # frame pointer for sopfr_aux()
            stack[display[2]+1] = 0   # rec = 0

            if stack[display[2]+0] % stack[display[1]+1] == 0:  # if n % div == 0
                print(display,stack)  # BREAKPOINT!
                stack[display[2]+1] = stack[display[1]+1]       # rec = div
                if  stack[display[2]+0] != stack[display[1]+1]: # if n != div
                    stack += [stack[display[2]+0]//stack[display[1]+1]] # push n/div ┐
                    sopfr_aux()                                         #            > sopfr_aux(n/div)
                    del stack[-1:]                                      #  pop n/div ┘
                    stack[display[2]+1] = stack[display[2]+1] + eax
            else:
                stack[display[1]+1] = stack[display[1]+1] + 1
                stack +=[stack[display[2]+0]] # push n  ┐
                sopfr_aux()                   #         > sopfr_aux(n) call
                del stack[-1:]                #  pop n  ┘
                stack[display[2]+1] = eax

            eax = stack[display[2]+1]
            display[2] = stack.pop()  # restore frame pointer
            del stack[-1:]            # remove rec from stack

        global display,stack,eax
        stack += [None]*1         # allocate memory for one local variable div
        stack += [display[1]]     # save old frame pointer
        display[1] = len(stack)-3 # frame pointer for sopfr()
        stack[display[1]+1] = 2   # div = 2
        stack += [stack[display[1]+0]] # push n ┐
        sopfr_aux()                    #        > sopfr_aux(n)
        del stack[-1:]                 #  pop n ┘
        display[1] = stack.pop()  # restore frame pointer
        del stack[-1:]            # remove div from stack

    global display,stack,eax
    stack += [None]*0           # no local variables
    stack += [display[0]]       # save old frame pointer
    display[0] = len(stack)-1   # frame pointer for main()
    stack += [42]               # push 42 ┐
    sopfr()                     #         > sopfr(42)
    del stack[-1:]              #  pop 42 ┘
    print(eax)
    display[0] = stack.pop()    # restore frame pointer

display = [ None ]*3 # uninitialized frame pointers
stack   = []         # empty stack
eax     = None       # registers
main()
'
[0, 1, 4] [None, 42, 2, None, 42, 0, None]
[0, 1, 10] [None, 42, 3, None, 42, 2, None, 21, 0, 4, 21, 0, 7]
[0, 1, 25] [None, 42, 7, None, 42, 2, None, 21, 0, 4, 21, 3, 7, 7, 0, 10, 7, 0, 13, 7, 0, 16, 7, 0, 19, 7, 0, 22]
12

At first glance it’s scary, but in reality it’s nothing complicated. Let’s figure out what’s going on here. First of all, let’s look at the fact that we have added global variables. Now we not only have stackbut the array also appeared display and variable eax.

The stack did not change its role, the variable eax emulates a processor register, and according to x86 function calling conventionsI will put the return value of the functions into this “register”.

But with a global array display more interesting. Note that it is not empty. This array will not change its size, the number of its cells is equal to the number of different scopes in our symbol table, in this case 3. The point is that when calling each function, we will write the index of the top of the stack into the corresponding cell, which will allow us to access variable div How stack[display[1]+1]and to the variable rec How stack[display[2]+1]. This technique completely removes the dependence on runtime; each variable is always identified by the same, constant pair of numbers, which we simply take from the symbol table. Of course, when calling the function, we will need to save (on the stack, where else) the old display value, and restore it upon completion.

Thus, the body of any function with nlocals local variables and nargs Our parameters will look like this:

def foo():
  global display,stack,eax
  stack += [None]*nlocals                     # reserve place for local variables
  stack += [display[scope]]                   # save old frame pointer
  display[scope] = len(stack)-nlocals-nargs-1 # set new frame pointer for foo()
  ...                                         # function body
  display[scope] = stack.pop()                # restore old frame pointer
  eax = ...                                   # return value
  if nlocals>0:                               # remove local variables from stack
    del stack[-nlocals:]

Well, calling such a function will look very simple:

  1. put on stack nargs values,

  2. call the function,

  3. remove from stack nargs values.

I didn’t just settle on sopfr, I spent a whole day choosing an example. It’s good because it has three different functions, which have different numbers of parameters, different numbers of local variables, and even have a non-local variable. Thus, it has three function bodies, and even more different calls, which allows you to see that they are executed in exactly the same way and exclusively using information available from the symbol table.

Exercise 2

We rewrote the code, but its structure has not changed. Let’s leave the breakpoint exactly where it was before and draw the state as an exercise display And stack all three times. Now I won’t hide the result, you won’t remember the arrows right away anyway. Don’t look too much at the answer, draw the stack honestly, it will help you understand what we are saving and at what point.

Answer:

The Home Straight: Evaluating the Values ​​of Expressions

There is very little left to assemble Python. The final touch is to evaluate the meanings of expressions. Let’s imagine that we have a program like this:

fun main() {
  var a:int;
  var b:int;
  [...]
  a+b*2
  [...]
}

Then in our symbol table we will have one variable scope with ID 0, and inside it we will have two variables with IDs 0 and 1, respectively. At the moment I’m interested in how to calculate the expression a+b*2.

In assembler it will look something like this (I remind you that in GNU assembler the operators look like op src dst):

mov display+0, %eax  # найти фрейм функции main
mov -1(%eax), %ebx   # положить значение b в регистр ebx
mov -0(%eax), %eax   # положить значение a в регистр eax
imul 2, %ebx         # ebx = 2 * ebx
add %ebx, %eax       # eax = eax + ebx

Everything would be fine, but here I had to be cunning: in order to put b V ebxI need to mess up the meaning eaxso first I read the argument band only then a. I also cleverly put the result 2*ebx back to ebx…And we must also remember that there are very few registers, and the expression a + b * 2 - foo(c - 3, d + 10*d, bar(a/4)) There will not be enough registers guaranteed.

Well, no problem. Let’s remember that we are writing a compiler, and that the parser has built us a syntactic expression tree that looks like this:

We can then generate code for evaluating the value of an expression using a depth-first traversal of the tree, using the stack to store intermediate expressions. Imagine that we agree on something very simple: each expression node stores its value in the same register, for example, %eax (by the way, a function call is also an expression, and in the example with sopfr the function call was precisely what saved its result in %eax).

And then a single (recursive) expression evaluation operation might look like this (for simplicity, I’m talking here only about binary operations like arithmetic):

оценить левое выражение (результат сохранён в %eax)
push(%eax)
оценить правое выражение (результат сохранён в %eax)
переложить %eax в %ebx
%eax = pop()
%eax = операция над %eax и %ebx

Using this approach, the expression a+2*b can be evaluated with the following pseudocode:

положить a в %eax                       \
push(%eax)                               |
положить 2 в %eax      \                 |
push(%eax)              |                |
положить b в %eax       | 2*b сохранено  | a + 2*b сохранено
переложить %eax в %ebx  |    в %eax      |    в %eax
%eax = pop()            |                |
%eax = %eax * %ebx     /                 |
переложить %eax в %ebx                   |
%eax = pop()                             |
%eax = %eax + %ebx                      /

Please note that the code is clearly not optimal, you can very easily save a lot of things (in fact, the assembly code that I gave earlier is much simpler and does not use a stack). But at this stage I’m not interested in optimization in principle. I need to find a general pattern for evaluating expressions, and it looks like I’ve found it! We will talk about the optimizing compiler separately (planned).

But this approach has an undeniable advantage: it uses only two registers and a stack, it doesn’t need anything else! So finishing touch over ours sopfr: all expressions are case evaluated eax. I still wrote the code by hand, but again, solely to test the functionality of the approach. Behold Python assembly code!

He is the same in text
ssloy@khronos:~$ python3 <<<'
def main():
    def sopfr():
        def sopfr_aux():
            global display,stack,eax,ebx
            stack += [None]*1         # allocate memory for one local variable rec
            stack += [display[2]]     # save old frame pointer
            display[2] = len(stack)-3 # frame pointer for sopfr_aux()
            eax = 0
            stack[display[2]+1] = eax # rec = 0

            eax = stack[display[2]+0] # n
            stack += [eax]            # stash left argument
            eax = stack[display[1]+1] # div
            ebx = eax                 # right argument
            eax = stack.pop()         # recall left arg
            eax = eax % ebx           # n % div
            stack += [eax]            # stash left argument
            eax = 0
            ebx = eax                 # right argument
            eax = stack.pop()         # recall left arg
            eax = eax == ebx          # n % div == 0
            if eax:  # if n % div == 0
                print(display,stack)  # BREAKPOINT!
                eax = stack[display[1]+1] # div
                stack[display[2]+1] = eax # rec = div
                eax = stack[display[2]+0] # n
                stack += [eax]            # stash left argument
                eax = stack[display[1]+1] # div
                ebx = eax                 # right argument
                eax = stack.pop()         # recall left arg
                eax = eax != ebx          # n != div
                if eax: # if n != div
                    eax = stack[display[2]+1] # rec
                    stack += [eax]            # stash left argument
                    eax = stack[display[2]+0] # n
                    stack += [eax]            # stash left argument
                    eax = stack[display[1]+1] # div
                    ebx = eax                 # right argument
                    eax = stack.pop()         # recall left arg
                    eax = eax // ebx          # n/div      ┐
                    stack += [eax]            #            │ sopfr_aux(n/div)
                    sopfr_aux()               #            │
                    del stack[-1:]            #  pop n/div ┘
                    ebx = eax                 # right argument
                    eax = stack.pop()         # recall left arg
                    eax = eax + ebx           # rec + sopfr_aux(n/div)
                    stack[display[2]+1] = eax # rec = rec + sopfr_aux(n/div)
            else:
                eax = stack[display[1]+1] # div
                stack += [eax]            # stash left argument
                eax = 1
                ebx = eax                 # right argument
                eax = stack.pop()         # recall left arg
                eax = eax + ebx           # div + 1
                stack[display[1]+1] = eax # div = div + 1
                eax = stack[display[2]+0] # push n ┐
                stack +=[eax]             #        │ sopfr_aux(n) call
                sopfr_aux()               #        │
                del stack[-1:]            #  pop n ┘
                stack[display[2]+1] = eax

            eax = stack[display[2]+1]
            display[2] = stack.pop()  # restore frame pointer
            del stack[-1:]            # remove rec from stack

        global display,stack,eax,ebx
        stack += [None]*1         # allocate memory for one local variable div
        stack += [display[1]]     # save old frame pointer
        display[1] = len(stack)-3 # frame pointer for sopfr()
        eax = 2
        stack[display[1]+1] = eax # div = 2
        eax = stack[display[1]+0] # push n ┐
        stack += [eax]            #        │ sopfr_aux(n)
        sopfr_aux()               #        │
        del stack[-1:]            #  pop n ┘
        display[1] = stack.pop()  # restore frame pointer
        del stack[-1:]            # remove div from stack

    global display,stack,eax,ebx
    stack += [None]*0           # no local variables
    stack += [display[0]]       # save old frame pointer
    display[0] = len(stack)-1   # frame pointer for main()
    eax = 42                    # push 42 ┐
    stack += [eax]              #         │ sopfr(42)
    sopfr()                     #         │
    del stack[-1:]              #  pop 42 ┘
    print(eax)
    display[0] = stack.pop()    # restore frame pointer

display = [ None ]*3 # uninitialized frame pointers
stack   = []         # empty stack
eax,ebx = None,None  # registers
main()
'
[0, 1, 4] [None, 42, 2, None, 42, 0, None]
[0, 1, 11] [None, 42, 3, None, 42, 2, None, 2, 21, 0, 4, 21, 0, 8]
[0, 1, 27] [None, 42, 7, None, 42, 2, None, 2, 21, 0, 4, 21, 3, 8, 3, 7, 0, 11, 7, 0, 15, 7, 0, 18, 7, 0, 21, 7, 0, 24]
12

Let’s sum it up

I repeat, what is important in this code is the repetition of the same patterns of the function body, calling the function and evaluating the value of expressions; in principle, they do not require the work of the brain, everything is written exclusively using the brain, which is excellent for automating the process.

Here you can look at all the changes in the repository since the previous release, there are very few of them. Compiler version for testing – v0.0.4.

As a result, we (almost) do not use anything from Python that assembler cannot do directly. Almost the only thing needed to generate assembly code is to change the stencil through which we write the code. Next time we’ll talk about this!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *