Creating your own snake dialect, or DSL, in Python

Let's say we have some executive core and a lot of users who speak Python at the “learn it completely in a week” level. They want to solve problems in their domain using kernel services with minimal effort.

We, as kernel developers, want, on the one hand, to hide all the “dirty laundry” behind some kind of interface, and on the other hand, to simplify user interaction with the kernel as much as possible.

As one of the possible solutions, I suggest creating your own dialect of Python scripts designed for a specific subject area. A sort of “poor man’s” DSL, with Python syntax, but with a runtime tailored for the tasks being performed.

I am Dmitry Pervushin, a developer in the department for the development of the back office of retail outlets of the Magnit chain. The main stack is Python, Firebird, a little HTML/JS and a bit of other technologies. My team is developing applications for data collection terminals, reports, and automated workplaces of retail outlets. This time I want to tell you how to embed custom scripts into a Python application.

Available palette

At a glance, there are at least three options:

  • normal import;

  • dependency injection via a factory method or class;

  • Dependency injection directly into the namespace.

Regular import

The script takes objects from the kernel

The script takes objects from the kernel

If we don’t invent anything, but use the kernel as a regular Python package, we’ll get something like:

# Импорт в скрипт необходимых объектов из ядра
from exec_core import reader, writer

# Используем объекты ядра для реализации бизнес-логики
x = reader()
writer(x / 0.87)

There are three big disadvantages to this approach:

  • We are tied to a specific implementation of the service, which is only permissible if it is unique and unchangeable.

  • The user is forced to repeat the import spell every time.

  • The user must know the design of the kernel and constantly use this knowledge, which leads to difficulties when refactoring the kernel.

Dependency injection via factory method or class

The kernel supplies the script with interface implementations

The kernel supplies the script with interface implementations

You can make a “magic” object to which the kernel will supply implementations of abstract interfaces, then it will turn out something like this:

def entry_point(reader, writer):
    """Ядро импортирует скрипт и вызовет эту функцию с конкретными реализациями сервисов."""
    x = reader()
    writer(x / 0.87)

Here we got rid of import and implementation binding: reader can read from anywhere without requiring script changes. Minus two minuses. But other disadvantages are still with us:

  • The kernel must actively use introspection to call the user function correctly.

  • The user must know the magic function signatures.

Dependency injection global namespace

The kernel supplies the implementations, the script uses it through the namespace

The kernel supplies the implementations, the script uses it through the namespace

If dependency injection is done through the global namespace, we will get the following script:

"""При импорте ядро производит инъекцию в глобальные переменные,
со стороны скрипта никаких действий не требуется"""

x = reader()
writer(x / 0.87)

Almost perfect. The list of available services can be obtained using the standard function dirhelp – helpin addition, the kernel can replace these functions with its own implementation, for example, to open help in the browser rather than print in the console.

Engine implementation

Skeleton

The heart of the scripting engine will be built-in functions compile And exec. First, you need to get a “code object” from the text type of the script – a script parsed into bytecode. Let's make it a challenge compiled = compile (body, script_file_name, "exec", dont_inherit=True)Where:

  • body – a line with the source text of the script.

  • script_file_name – path to the script file. If you specify a real file, then as a bonus you will receive clear stack traces and debugging support – breakpoints, step-by-step execution.

  • "exec" – compilation mode “we have many operators”. Alternative "eval" – “we have the only expression” – is not suitable in this case, although it may be useful to someone.

  • dont_inherit – do not inherit “future settings” (from __future__ import …) from the kernel module. Let's leave this at the discretion of the script author.

We have the code, now we need an environment in which this code will be executed. The environment must be a regular dictionary; no other types are allowed. We fill it with kernel methods available from the script and source coordinates:

runtime={
    "__file__": script_file_name,
    "__name__": os.path.splitext(os.path.basename(script_file_name))[0],
    "reader": lambda : float(input("Enter a number, please:")),
    "writer": print,
}

If you don't specify a key "__builtins__"then a module will be automatically added under this key builtins.

And the final chord is execution: exec(compiled, runtime)

In principle, this minimal skeleton is already quite functional, but for real use it should be supplemented:

With logging and errors, everything is trivial, but for import I’ll add a couple of comments.

Let's put it all together. Well approximately.
def script_exec(script, rtl: AbstractRTL):
    """Кусочек из реального проекта. Выполняет пользовательский код,
    использующий сервисы ядра"""
    wd_old = os.getcwd()
    wd = os.path.dirname(script.name) or wd_old
    if wd != wd_old:
        os.chdir(wd)
    sys.path.insert(0, wd)
    getLogger(__name__).info(f"Трансляция скрипта {script.name} начата")
    try:
        body = script.read()
        compiled = compile(body, script.name, "exec", dont_inherit=True)

        global_dict = {
            k: getattr(rtl, k)
            for k in (set(AbstractRTL.__dict__) | set(rtl.__class__.__dict__) | set(rtl.__dict__))
            if not k.startswith("_")
        }
        global_dict.update(__file__=script.name, __name__=os.path.splitext(os.path.basename(script.name))[0])
        exec(compiled, global_dict)
        global_dict["end"]()
    except Exception:
        getLogger(__name__).exception(f"Трансляция скрипта {script.name} провалилась")
        raise
    else:
        getLogger(__name__).info(f"Трансляция скрипта {script.name} завершена успешно")
    finally:
        sys.path.pop(0)
        os.chdir(wd_old)
    return

Import in scripts

For script files, you need to add the script directory to the list of “import paths” sys.path and change the current directory.

For scripts loaded from a database, network or random number generator, you will have to change the import mechanisms. Working with these mechanisms is a separate exciting adventure, so those interested can look into the package importlib standard library.

Safety

Due to the extreme dynamic nature of Python, it is almost impossible to build a completely impenetrable sandbox for untrusted code using namespace manipulation alone. Therefore, if you are concerned about protecting against threats from user code, you will have to use more complex techniques that go very far beyond the scope of this article.

Conclusion

Using the technology shown, you can not only significantly simplify and speed up writing scripts for working with your application, but also complicate the lives of users: functionality appears in scripts in a “magical” way, invisible to users, surprising and frightening them. To make working with scripts more comfortable, do not forget to document the available options. For example, use the familiar Python link dir + help + docstring: users will definitely thank you for this.

That's all. I wish everyone fewer bugs and more satisfied users.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *