Breaking Brainfuck Haskell

A bit about bfc

Brainfuck is a very stupid language. There is a tape of 30k cells, one byte each. The bfc commands are:

  • Movement along the tape to the left and right (symbols < And >)
  • Increase and decrease the value in the cell (symbols + And -)
  • Input and output of the current cell (symbols . And ,)
  • And a while loop that continues until the value in the current cell is zero. [ и ] this is the beginning and end of the cycle, respectively

Programming in bfc is hard. But, as you know, any problem can be solved by adding an abstraction layer (except for the problem of a large number of abstractions).

Let’s start small, as usual.

Basic definitions

The main constructions are simply transferred one to one to Haskell

data Instr = Inc | Dec | Shl | Shr | In | Out | Loop [Instr] deriving (Eq, Show)
--            +     -     <     >     ,    .     [...]

I will not give the bfc interpreter, a huge number of them have been written. Yes, and I wrote it as a blunder only so that it would not be necessary to exit the REPL each time.

The main abstraction above the instructions will be the code generator:

type CodeGen a = Writer [Instr] a

Looks scary, but CodeGen simply returns a list of instructions. If there are several code generators, then they are called in turn, and the instructions are glued into a large list.

The basis of fundamentals in bfc is offsets. In order not to write them manually, we create functions rshift And lshift:

rshift :: Int -> CodeGen ()
rshift x | x > 0 = tell $ replicate   x  Shr
         | x < 0 = tell $ replicate (-x) Shl
         | otherwise = return ()

lshift :: Int -> CodeGen ()
lshift x = rshift (-x)

If the offset is positive, repeat x times shift to the right, if negative, then repeat -x once shift to the left, otherwise we do nothing. Offset to the left x is the same as shifting to the right by -x

Let’s add a convenient way to loop the code:

loop :: CodeGen () -> CodeGen ()
loop body = tell $ [Loop (execWriter body)]

Inside we get a list of instructions bodyinsert them into Loopand with the help tell make it new CodeGen ()

Let’s write instructions reset. It will reset the value in the current cell.
The zeroing algorithm is simple: while the value is not zero, we decrease it. In bfc it is written like this: [-]. And in our functions it will be written like this:

reset :: CodeGen ()
reset = loop $ tell [Dec]

It remains to build a few more functions, and you can build the next level of abstraction. And the functions that we need are move-s. They will assist in moving and copying the value.

Let’s digress a bit from Haskell and see how it’s done in bfc. In order to transfer the value from the current cell to the next one, we write [->+<]. This code should be read like this:
while there is something in the cell, subtract one, go to the next cell, add one there and go back.
This code will not work correctly if there is already something in the neighboring cell, so you need to reset it before starting the transfer. In the same way, you can transfer to two or even N cells at once. The code looks similar: while there is a value in the cell, we subtract one, go through all the cells into which we transfer this one and add one there. After we return back.

First, let’s write a simple version without nulling:

moveN' :: From Int -> To [Int] -> CodeGen ()
moveN' (From src) (To dsts) = do
  rshift src -- Переходим к ячейке-источнику
  loop $ do
    tell $ [Dec] -- Однимаем единицу
    lshift src   -- Возвращаемся к базовому адресу

    -- Проходимся по всем ячейкам и добавляем туда единицу
    forM_ dsts $ \dst -> do
      rshift dst   -- Смещение
      tell $ [Inc] -- Инкремент
      lshift dst   -- Возвращение к базовому адресу

    rshift src -- Возвращение к ячейке-источнику

  lshift src -- Возвращение к базовому адресу

And now, based on it, a more complex one:

moveN :: From Int -> To [Int] -> CodeGen ()
moveN (From src) (To dsts) = do
  -- Обнуляем dst ячейки
  forM_ dsts $ \dst -> do
    rshift dst
    lshift dst

  moveN' (From src) (To dsts)

From And To, which are used above, in fact, do nothing. These are just wrapper words so as not to get confused from where and where we are passing everything. Because of them you need to write move (From a) (To b)instead of move a b. The first one is clearer to me personally, so I will stick to this style.

Let’s add move-s that will be used most often into separate functions to write less

move :: From Int -> To Int -> CodeGen ()  
move (From src) (To dst) = moveN (From src) (To [dst])

move2 :: From Int -> To Int -> To Int -> CodeGen ()  
move2 (From src) (To dst1) (To dst2) = moveN (From src) (To [dst1, dst2])

The safe (unshaded) version will be used almost everywhere.

From rags to riches, from cells to registers

As in a real Turing machine, let’s start a writing head that will be able to move along the tape. Almost like the one in bfc out of the box. But! This writing head will have state. It will store a certain number of registers that will not lose their state when moving.

Let’s define buffers, general purpose registers, and temporary registers.

data Register = BackBuf | GP Int | T Int | FrontBuf deriving (Eq, Show)

gpRegisters :: Int
gpRegisters = 16

tmpRegisters :: Int
tmpRegisters = 16

Together with it, we define the function relativePoswhich is actually a synonym for fromEnum. I will not give the code, it is cumbersome and boring, but in a nutshell: the register is taken as zero GP 0, BackBuf has a position -1after sixteen GP registers go 16 T registers, and after them FrontBuf.
After each assembler instruction, we will return to the position GP 0so that relative addresses never change.

inc and dec for registers

The first and easiest thing to do with registers is to learn how to increase and decrease them by one. To do this, we find the position of the register, go there, increase the cell by one and return to GP 0.

withShift :: Int -> CodeGen () -> CodeGen ()
withShift shift body = do
  rshift shift
  lshift shift

inc :: Register -> CodeGen ()
inc reg = do
  let pos = relativePos reg
  withShift pos $ tell [Inc]

dec :: Register -> CodeGen ()
dec reg = do
  let pos = relativePos reg
  withShift pos $ tell [Dec]

Similarly, we define input and output. The corresponding functions are named inp And out.

Loading a constant into a register

The code is almost the same: shift, reset the register, increase to the desired value, return to the base address

set :: Register -> Int -> CodeGen ()
set reg val = do
  let pos = relativePos reg

  withShift pos $ do
    tell (replicate val Inc)

At the same time, we define a beautiful synonym

($=) :: Register -> Int -> CodeGen ()
($=) = set

Now you can write GP 0 $= 10

Assembly instruction mov

mov is the first high-level instruction to use temporary registers. By default, we will always take the first unoccupied register. In our case, this T 0
The transfer algorithm is as follows: first transfer the value from the register x V y And T 0. Then from T 0 return the value back to xsince move2 messed up the value in x

mov :: From Register -> To Register -> CodeGen ()
mov (From x) (To y) = do
  let src = relativePos x
      dst = relativePos y
      buf = relativePos (T 0)

  move2 (From src) (To dst) (To buf)
  move (From buf) (To src)

Yes, mov with register T 0 will not work. Such a special temporary register, into which even mov-nothing is possible.

Add and subtract registers

I only give addition, since subtraction is similar:

add :: Register -> To Register -> CodeGen ()
add x (To y) = do
  let src = relativePos x
      dst = relativePos y
      buf = relativePos $ T 0

  -- Переходим к регистру x, чтобы цикл работал правильно
  withShift src $ do
    loop $ do
      tell [Dec] -- Отнимаем единицу
      withShift (-src) $ do -- Относительно базового адреса делаем:
        withShift dst $ tell [Inc] -- Прибавляем 1 к регистру y
        withShift buf $ tell [Inc] -- Прибавляем 1 к буферу

  move (From buf) (To src) -- Переносим буфер обратно в x

IN sub one word changed: Inc changed to Dec in the line “Add 1 to register y”

Loops and branches

Cycle while takes a register as input and repeats the actions until this register is zero. To do this, we need that when we start the loop, we are in a cell of this register, so we shift to pos. But the assembler instructions (loop body) require us to always be in the cell GP 0so at the beginning of the loop we move back and only then we call the loop body itself.

while :: Register -> CodeGen () -> CodeGen ()
while reg body = do
  let pos = relativePos reg

  withShift pos $ 
    loop $ withShift (-pos) $ body

Branching is, one might say, a cycle into which we go no more than once. At the end of the loop, we take the register to the buffer, the loop ends, since the register is zero, and we return the value back.

when :: Register -> CodeGen () -> CodeGen ()
when reg body = do
  let pos = relativePos reg
      buf = relativePos (T 0)

  while reg $ do
    move (From pos) (To buf)

  move (From buf) (To pos)

Working with the Ribbon

It’s time to teach our machine to work with memory. Of course, we have a lot of registers, but you won’t be full of them alone. The first thing you need to learn how to do is load and unload values ​​from registers, since access to an arbitrary cell in memory will work on their basis

Loading and unloading values

Loading and unloading will happen to the cell immediately after the front buffer

There is little new here, since this is literally the same code that was in mov-ax, with the exception that one “register” (the cell before the writing head is not a register) is fixed

load :: Register -> CodeGen ()
load reg = do
  let dst = relativePos reg
      src = relativePos FrontBuf + 1
      buf = relativePos (T 0)

  move2 (From src) (To dst) (To buf)
  move (From buf) (To src)

store :: Register -> CodeGen ()
store reg = do
  let src = relativePos reg
      dst = relativePos FrontBuf + 1
      buf = relativePos (T 0)

  move2 (From src) (To dst) (To buf)
  move (From buf) (To src)

Mini-boss: Head Offset

Let’s move to the rightmost case first. This is the last of T-registers or first before FrontBuf. Shift the registers one to the right.

When this happens, a collision occurs. FrontBuf and cells in front of the typewriter, so move it to an empty space in front of BackBuf

shrCaret :: CodeGen ()
shrCaret = do
  let buf = relativePos FrontBuf
      buf2 = relativePos BackBuf
  rshift buf
  replicateM (gpRegisters + tmpRegisters) $ do
    move (From $ -1) (To 0)
    lshift 1

  rshift 1
  move (From buf) (To $ buf2 - 1)

Shift to the left is similar, I will not give the code

Random access to memory

Phew, finally got to this part. This is the last fundamental abstraction, we will soon move on to examples.

derefRead :: From (Ptr Register) -> To Register -> CodeGen ()
derefRead (From (Ptr ptr)) (To reg) = do
  -- Используем T 1, так как T 0 будет испорчен mov инструкцией
  let counter = T 1

  -- Путешествие до участка памяти
  mov (From ptr) (To counter)
  while counter $ do
    dec counter
  -- Сохраняем на сколько мы ушли, вдруг ptr и reg совпадают
  -- Тогда load потеряет данные
  mov (From ptr) (To counter)

  load reg
  -- Путешествуем назад
  while counter $ do
    dec counter

derefWrite works exactly the same.
Here, as with move i put in a wrapper Ptrto clarify


fibonacci numbers

Calculate the fifth fibonacci number

program :: CodeGen ()
program = do
  -- Чтобы BackBuf не выползал за начало ленты (его позиция -- -1)
  -- Смещаемся на 1 вправо

  -- Определяем говорящие синонимы для регистров
  let counter = GP 0
      prev    = GP 1
      next    = GP 2
      tmp     = T 1

  counter $= 5 -- Делаем цикл пять раз
  prev $= 0
  next $= 1

  -- Классический алгоритм для подсчёта чисел Фибоначчи
  while counter $ do
    dec counter

    mov (From next) (To tmp)
    add prev (To next)
    mov (From tmp) (To prev)

  -- Чтобы вывести символ, нужно чтобы он был в каком-то регистре
  -- Выводить будем в одинарной системе счисления, потому что так проще
  let char = tmp
  char $= fromEnum '1'

  while prev $ do
    dec prev
    out char

  -- Перенос строки для красоты
  char $= fromEnum '\n'
  out char

Compiled code on bfc


Your Haskell is only for factorials and is fine

Pure truth. Write your factorial

mul :: Register -> To Register -> CodeGen ()
mul x (To y) = do
  -- Т 0 уже занят mov и add 
  let acc = T 1
      counter = T 2

  acc $= 0
  mov (From y) (To counter)

  -- Умножение это сложение, просто много раз
  -- y раз к acc прибавляем x 
  while counter $ do
    add x (To acc)
    dec counter

  -- Сохраняем результат
  mov (From acc) (To y)

program :: CodeGen ()
program = do
  let n   = GP 0
      acc = GP 1

  n $= 5
  acc $= 1

  while n $ do
    mul n (To acc)
    dec n

  let char = T 1
  char $= fromEnum '1'

  while acc $ do
    dec acc
    out char

  char $= fromEnum '\n'
  out char

Can it be output in binary?

Yes Easy!

-- Булево НЕ
inv :: Register -> CodeGen ()
inv reg = do
  let tmp = T 1

  mov (From reg) (To tmp)
  reg $= 1
  when tmp $ do
    reg $= 0

-- Работа с лентой как со стеком
push reg = do
  store reg

pop reg = do
  load reg

program :: CodeGen ()
program = do
  let number = GP 0
      digits = GP 1
      char = T 2

  number $= 64 
  digits $= 0 -- Сохраняем количество цифр, чтобы не провалить стек

  while number $ do
    let tmp    = T 3
        isEven = T 4

    -- Находим чётность числа
    isEven $= 1

    mov (From number) (To tmp)    
    while tmp $ do
      inv isEven
      dec tmp

    -- Если делится на два 
    when isEven $ do
      char $= fromEnum '0'

    -- Если не делится на два
    inv isEven
    when isEven $ do
      char $= fromEnum '1'
      -- Делаем так, чтобы делилось
      dec number

    -- Записываем цифру в стек
    push char
    inc digits

    -- Делим число на два
    mov (From number) (To tmp)
    number $= 0
    while tmp $ do
      tmp -= 2
      number += 1

  -- Выводим по одной цифре
  while digits $ do
    pop char
    out char
    dec digits
  -- Ну и перенос строки
  char $= fromEnum '\n'
  out char

Compiled code on bfc


Well, the article has come to an end. Time of the treasured picture


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *