How I wrote my first esoteric programming language

From the 7th grade, I became very interested in programming, after in one of my computer science lessons we started learning a language like Python. For me, this was something new and scary, because before that I had only programmed in block languages ​​(constructors like Scratch). However, I very quickly delved into the study of this wonderful language and have been creating various programs in it for three years now. I wrote a lot on it, from “Hello, World!” to your own game engine. It was an incredible experience.

But most of all I wanted to write my own programming language that would be unusual. There were many attempts, but they were all in vain because I lacked experience. And a week ago I saw a video on YouTube where the author talks about 10 esoteric languages ​​- languages ​​that were specially created to make programs difficult or almost impossible to write in them. And I got a strong inspiration to write the same language. This is how the C42 language was born.

About the language

C42 is a language similar to Assembler, where there are only 42 instructions for writing a program, and data can be stored in special cells (like variables), each of which can only store a certain type of data (int, string, float). Here's an example of code that prints everyone's favorite phrase to the console:

#1 1

41 -1 1
04 -1 "Hello, World!"
02 -1

#0

In this language, I decided to make sure that the code was stored in a specific block (analogous to a function) that can be called. In this code there is one block with the name 1, and for good reason, because the block with the identifier 1 is the entry point (like the main function in C), where the main code of the program should be stored. The beginning of the block is indicated #1 blockID. and the end – #0. It is worth noting that the block name can only be a number and nothing more. And then commands are written in the block. Comments in this language are only available in single-line form, starting with the symbol $ everything further will be considered a comment.

Each command in C42 takes a certain number of arguments (or none at all). For example, command 41 (creating a new cell) takes 2 arguments: the name of the new cell, which can only be a negative number from -1, and the type of data that this cell can store: 0 – int, 1 – string, 2 – float

How I wrote the interpreter

As soon as I started implementing the language, I chose to write an interpreter without hesitation. In my opinion, creating a compiler in my case would be pointless, and I had never encountered writing compilers before. I decided to create a very simple interpreter without a lexer and a parser (although the parser is still partially present). As a result, I ended up with 4 files:

  • main.py – as the main file to launch the interpreter.

  • interpreter.py – the interpreter himself.

  • exception.py – a file for outputting errors in the code.

  • constants.py – where there are all the necessary constants.

My main file was quite short:

from interpreter import Interpreter


code: str = ""

with open("code.cft", "r", encoding = "utf-8") as file:
    code = file.read()

C42 = Interpreter(code)
C42.Interpret()

Here I simply import the interpreter class, read the code file, create a new instance and call the main method of the class.

The error handling file also turned out to be quite compact, only 49 lines of code:

class Error:
    def __init__(self, message, line, command):
        print()
        if command != None:
            print(f"> {command}")
            print(f"Ошибка : на строке {line} : {message}")
        else:
            print(f"Ошибка : {message}")
        exit(1)

class BlockNotFound(Error):
    def __init__(self, name):
        super().__init__(f"Не удалось найти блок под номером {name}", None, None)
...

Here I've created a base class that prints the command that caused the error, the line number of the error, and the error message itself. Then based on this class I created child classes. There’s no need to go through the file with constants; I just created a variable with its identifier for each command. The file with the interpreter was 576 lines long, since I included the implementation of all commands directly in the class. This may not be the best approach, but I think it will work for my case. The file contains two classes: Cell, for convenient work with cells:

class Cell:

    CELLS: list['Cell'] = []

    def __init__(self, name: str, defaultValue: int | float | str, dataType: str):

        cell = Cell.GetCellByName(name)
        if cell != None:
            Cell.CELLS.remove(cell)
        Cell.CELLS.append(self)

        self.name = name
        self.defaultValue = defaultValue
        self.value = defaultValue
        self.dataType = dataType
    
    @staticmethod
    def GetCellByName(name: str) -> 'Cell':
        return next((cell for cell in Cell.CELLS if cell.name == name), None)

    @staticmethod
    def isFloat(s):
        try: 
            float(s)
            return True
        except:
            return False

    @staticmethod
    def isInt(s):
        try: 
            int(s)
            return True
        except:
            return False

    @staticmethod
    def isString(s):
        return s[0] == "\"" and s[-1] == "\"" if s != "" else True
    
    @staticmethod
    def isCorrectName(name: str):
        return bool(re.match(r"^-[1-9][0-9]*$", name))
    
    @staticmethod
    def isCorrectDataType(char: str):
        return char in [INT, STRING, FLOAT]

And the interpreter itself:

class Interpreter:
    def __init__(self, code: str):

        self.cells: list[Cell] = []

        self.code = code
        self.blocks: dict[list[list[str]]] = {}
        self.currentLine = 0
        self.currentCommand = ""
        self.skipNextCommand = False
        self.executionStack: list[list[str, bool]] = []
        self.returnCalled = False

        self.Parse()
    
    def Interpret(self, blockId: str = "1"):
        if blockId not in self.blocks:
            BlockNotFound(blockId)
        
        self.executionStack.append([blockId, False])
        idx = {block: -1 for block in self.blocks}

        while self.executionStack:
            currentBlock = self.executionStack.pop()
            commands = self.blocks[currentBlock[0]]
            blockName = currentBlock[0]
            blockIsLoop = currentBlock[1]

            while 1:
                idx[blockName] += 1 if not idx[blockName] + 1 > len(commands) - 1 else 0
                self.currentLine = commands[idx[blockName]][0]
                self.currentCommand = " ".join(commands[idx[blockName]][1])

                if self.skipNextCommand:
                    self.skipNextCommand = False
                    continue

                nextCommand = None if idx[blockName] + 1 > len(commands) - 1 else commands[idx[blockName] + 1]
                forceExit = self.ExecuteCommand(commands[idx[blockName]][1], nextCommand)

                if self.returnCalled:
                    break
                elif forceExit:
                    break
                elif idx[blockName] >= len(commands) - 1:
                    idx[blockName] = -1
                    break

            if not self.returnCalled and blockIsLoop:
                self.executionStack.append(currentBlock)
                self.returnCalled = False
            elif forceExit and not idx[blockName] >= len(commands) - 1:
                self.executionStack.append(currentBlock)
                self.executionStack[-1], self.executionStack[-2] = self.executionStack[-2], self.executionStack[-1]

    def ExecuteCommand(self, command, nextCommand):
        CMD = command[0]

        if CMD == EXIT: exit(1)
        
        elif CMD == PRINT:
            cell = self.GetCell(self.GetArgument(1, command))
            print(str(cell.value).replace("\\n", "\n"), end = "", flush = True)
    
        elif CMD == INPUT:
            cell = self.GetCell(self.GetArgument(1, command))
            value = input()
            self.ChangeValue(cell, value, False)

        elif CMD == ASSIGN_VALUE:
            value = self.GetArgument(2, command)
            cell = self.GetCell(self.GetArgument(1, command))
            self.ChangeValue(cell, value)
            
        ... # дальше тут реализация остальных команд

    def Parse(self):
        lines = self.code.split('\n')
        result = {}
        block = None
        line_number = 1

        for line in lines:
            if line.startswith(START_BLOCK):
                if block != None:
                    del result[block]
                words = line.split()
                if len(words) <= 1 or len(words) > 2:
                    block = None
                    continue
                block = words[1]
                result[block] = []
            elif line.startswith(END_BLOCK):
                block = None
            elif block is not None and line.strip():
                parsed_line = (line_number, re.findall(r'(?:"[^"]*"|[^"\s]+)', line))
                if '$' in parsed_line[1]:
                    parsed_line = (parsed_line[0], parsed_line[1][:parsed_line[1].index('$')])
                if parsed_line[1]:
                    result[block].append(parsed_line)

            line_number += 1

        self.blocks = result
    
    def GetCell(self, name: str) -> Cell:
        cell = Cell.GetCellByName(name)
        if cell != None:
            return cell
        CellNotFound(self.currentLine, self.currentCommand, name)
    
    def GetArgument(self, index: int, command: list[str]) -> str:
        if index <= len(command) - 1:
            return command[index]
        InvalidSyntax(self.currentLine, self.currentCommand)
    
    def ChangeValue(self, cell: Cell, value: str, lookAtQuotes = True, mode = "set") -> bool:
        if value != str:
            value = str(value)

        if cell.dataType == INT:
            if Cell.isInt(value):
                if mode == "set":
                    cell.value = int(value)
                elif mode == "add":
                    cell.value += int(value)
            else:
                IncorrectValue(self.currentLine, self.currentCommand, "int")
        
        elif cell.dataType == FLOAT:
            if Cell.isFloat(value):
                if mode == "set":
                    cell.value = float(value)
                elif mode == "add":
                    cell.value += float(value)
            else:
                IncorrectValue(self.currentLine, self.currentCommand, "float")
            
        elif cell.dataType == STRING:
            if Cell.isString(value) or not lookAtQuotes:
                if mode == "set":
                    cell.value = value[1:-1] if lookAtQuotes else value
                elif mode == "add":
                    cell.value += value[1:-1]
            else:
                IncorrectValue(self.currentLine, self.currentCommand, "string")

IN init I create everything needed and call the method Parse. This method converts the input code into a dictionary of blocks, each containing a complete list of commands from the block. In method Interpret the program goes through the last block in self.executionStack, which stores the blocks that the program must execute. If this block was called as a loop, then the program will execute it until command 42 (return) is called inside the block. If the block was called for normal execution, then after going through all its commands, the program will remove the block from the list.

Conclusion

Overall, I'm glad I was able to write my own language. Although it did not bring me much experience, I was able to fulfill my old dream. Thank you for reading my first article. I wish everyone a nice day!

The source of the language, if anyone needs it – AlmazCode/C42: C42: Esoteric programming language inspired by ASM. (github.com)

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *