how to write a database backend?
Hello dear Python backend developers, I have another article about django. And so it will be until it has normal support for asynchrony (just kidding).
Contrary to popular belief that django is a battery powered framework but not very customizable, it is not. The need to support different providers, support for the so-called multidb (simultaneous use of several databases), and just a banal use of common sense (in some places) made this framework one of the most extensible among ORMs.
In this article I will tell you how the database backend works – this is the thing that is responsible for supporting a specific database and a specific driver for it. I will do this using a rather exotic example: we will add support for an asynchronous driver – psycopg3. Yes, it’s asynchronous. Did you think that django can’t do that? Read and judge for yourself.
In general, intrigue is not my forte, so I want to immediately dispel all the fog over asynchrony. The fact is that I use a similar approach as sqlalchemy, namely greenlet. I already wrote about this on Habré, you can also read about it here. In general, this is such an ancient magic that has been put at the service of production.
So, due to this chosen approach, writing a backend for an asynchronous driver is not very different from writing it for a synchronous case. I chose psycopg3 as a driver: it has a friendlier API, close to DB-API2. The odious asyncpg does not follow it at all, although it wins in terms of performance. By the way, the mentioned approach with greenlets allows you to make the asynchronous driver also compatible with DB-API2. Which is what I did.
There are, of course, fundamental differences between the asynchronous backend and the synchronous one, more precisely, one difference. In the synchronous case, each thread processes only one request at a time. A certain connection to the base is assigned to him – a connection, and he uses it. In the asynchronous case, a large number of requests are processed simultaneously – it is impossible to say in advance which one. It is far from certain that we will be able to provide each such handler with a connection – there simply will not be enough of them. Instead, we need to use a pool of connections, and issue them from there for temporary use (for the duration of one transaction).
This is easy to achieve. For example, how is a connection usually used in jungian backends? Like this:
with connection.cursor() as cursor: # use cursor
For an asynchronous backend, you can do the same, but return the connection to the pool when the cursor closes. Be sure to use a context manager to ensure that the connection is returned.
By the way, there is a clear confusion with the names in django. Here, for example, how to get a default connection? You might be thinking like this:
from django.db import connection, connections connection # <- это прокси-объект connections['default'] # <- вот к этому
So, nothing of the sort, this is a database backend, not a connection. The connection itself is located in the attribute of the same name:
So don’t get confused. There is also the issue of autocommit. The fact is that if we always take a connection for a while and know exactly when we give it back, then why do we need an autocommit? We are guaranteed to be able to commit ourselves. In general, I think autocommit is one of the bad ideas in django.
In general, we can probably talk about this for a long time, and we have a repository with the code – here he is. In the pgbackend directory lies – you probably guess what. In the proj directory is a django test project, in kitchen is a django-app test project.
The code is working – you can run and check, there are instructions in the README. Sorry that the git repository is listed in the dependencies for django – the point is that you need branch with psycopg3 support, and it hasn’t been merged into main yet (but will be merged soon!)
So, if you didn’t guess, the directory pgbackend is our backend. More or less advanced djangonauts know that there you need to look for a module base – he is there. Having opened it, you will see that it is almost empty: ops_class is registered, in which the module with “compilers” is specified, the connection is created in the constructor – and nothing more.
Great, if there is almost nothing in the base.py module – see if there is anything else somewhere.
I mentioned compilers. If anything, they “compile” the query into SQL. Compilers are one of the main ways to customize the backend. In general, in secret, the entire backend is just one method in the compiler – execute_sql. So, generally speaking, replacing the backend is easy enough – you need to replace this single function. In fact, there is not one compiler, but one for each type of query – for SELECT, INSERT, UPDATE and DELETE. But these are already particulars – in general, the backend, as I said, is easy to expand. And for the fact that, in fact, you need to redefine only one function – pluses in karma for django developers.
Since I’ve already started talking about compilers, I’ll tell you about the changes that were required there. In general, there is such a feature in django as QuerySet.iterator() – support for server-side cursors. It allows you to return the results of a query (and execute it too) in parts, in chunks. This can be useful, for example, if the query result does not fit in memory. For pagination, it is hardly reasonable to use this, given that we are occupying an entire connection for an indefinite time.
In general, I decided that this feature was not needed for any reason, and decided not to support it. When execute_sql() completes, I simply close the cursor and return the connection to the pool. Dear developers, if you know why server-side cursors are needed, please write to this issue. I already doubt – maybe I will return them back.
You can also find the code in the repository wrapper around the cursor – I use it to turn an asynchronous cursor into a synchronous one. After transformation, my cursor is quite compliant with DB-API2.
Transactions in django are not really meant to be reimplemented in the backend: they are just a regular atomic function and an Atomic class. Anyway, it was not a big obstacle: check out the module patches – find out why.
Perhaps, if you search, you will find more interesting moments. Well, the rest is boring coding, what can I say?
As for the backend itself, it is the most real, designed for production. Not tested yet – it’s true, but it’s the next step. By the way, I can use any tests from the django suite to test my backend. Maybe I’ll add more of my own. And – you can publish a beta version. The developers of django are only asked not to break it during this time. No, this is already redundant.