Asynchronous django be. Exclusively for habr

Hello, django hub readers. This article is about a deadline perfectionist framework and whether asynchrony can be added to it. Some are aware that there is some effort on the part of the Django Foundation in this direction as well. For example, there is DEP-09, which roughly outlines the boundaries of future changes. Moreover, some transformations, in the opinion of the authors, are too voluminous, and it is clearly said about them that they go beyond the DEP framework. Such, for example, is the task of making django orm asynchronous. Considering that, by any measure, django orm is more than 50% of all django, and in my opinion, its main part, DEP-09 seems to me to be some kind of incomprehensible half measures.

I have an alternative proposal for adding asynchrony, which is somewhat more radical. One that doesn’t need DEP. In general, I want to make a new version that will replace django. Not that this is an end in itself: the main thing is that I want to release an asynchronous version of django, but synchronous will also be supported, so traditional django is unlikely to be much needed. By the way, even if you choose between only the synchronous and only the asynchronous version, then only the asynchronous one has a clear advantage, in my opinion. By the way, I only want to port the django orm, not the whole django. That is, exactly the part that is outside the scope of DEP-09. So, welcome.

The proposed approach

Among Python programmers, there is a well-known (here doubts) principle called “No I / O”. It was used to write various web clients (HTTP, HTTP / 2), you can read about them here: sans.io… “No I / O” means the libraries do not deal with I / O at all. It is proposed to consider that the method of data transmission can be anything (pigeon mail!), And we cannot say anything about it. Nevertheless, the data transfer format does not change from this, and the network protocol itself can be implemented by 95%, at least.

Interestingly, this approach allows, in particular, to have a single implementation for synchronous and asynchronous code. True, incomplete: we also need a high-level shell, which, in fact, provides input / output. Thus, splitting a library or framework into different parts (No I / O and I / O) not only makes it easier for other libraries to reuse code, but also allows that framework or library itself to cover the asynchronous use case. For video lovers – here’s a link

How can this be applied to django? Let me remind you that I will only talk about porting django orm. In general, django-orm is the only major part that doesn’t depend on the rest of django.

So how do you do without I / O if you need to access the database? Of course, don’t contact her. We need a separate thin layer that will do this (act as a driver). This layer will be separated from django (orm), while django itself will assume that if we have SQL, we can somehow get rows with data from it, and it already knows how to process them. We can say that api will be built on callbacks – it is a universal interface that does not depend on whether our code is synchronous or asynchronous.

Not to be unfounded, I wrote a proof-of-concept that makes django-quersets Awaitable. That is, so that you can write `await MyModel.objects.all ()`. Considering all sorts of things like prefetch_related, this is not 100% trivial. Here it is https://github.com/pwtail/django/pull/2/files, it works! But more on that below.

Compatibility

Actually, at the beginning I only wanted to make the asynchronous version. The idea to use the “No I / O” approach came later when I saw that it allows you to have a single version for synchronous and asynchronous code, and, moreover, that it will be compatible with traditional django. So what is compatibility and what will it be?

Oddly enough, by compatibility, in the context of porting to asynchronous rails, it is customary to understand more than just API compatibility (new synchronous and old, only synchronous, version). It also requires that the synchronous and asynchronous versions be provided by the same repository. Otherwise, no, not enough. So, the proposed approach ensures compatibility in this very sense. And the synchronous version is also compatible (with the old, only synchronous). Perhaps there will be breaking changes. I think it depends on the timing of the development of the project. But it’s kind of normal. But there is one more kind of compatibility that I want to talk about.

This is database-level compatibility. And at the model level. In other words, how do Python objects correspond to database entities (tables, columns, indexes). Here, nothing depends on asynchrony. And this is the kind of compatibility that I, for no good reason, definitely do not plan to violate. This means, for example, that the django admin panel from django / django should work fine with my version, although it won’t be part of my repository.

Motivation and goals

For myself, this project is an exercise, a kind of term paper. The goal of the project is clearly defined and achievable. The result is also easy to assess. I’m not going to expand the scope of the project, it’s only about asynchrony.

In terms of purpose, this is mainly the asynchronous version of django, with the same API as traditional django. The fact that a synchronous version will be provided at the same time is a consequence of the approach used and a nice bonus.

proof-of-concept

So, we got, in fact, to the idea itself and to an example of its implementation.

As we know, the database driver deals with I / O – we will take it out from orm. Let me remind you that in general, in django, the database backend is engaged in integration with a specific database, only a small part of which is the driver itself. Of course, we only want to get rid of the driver, and leave the database backend itself. The driver has a simple interface: according to the given SQL, we receive lines with data, possibly chunks (and perhaps we use server-side cursors). Of course, we need to consider all this with an example, let’s take my pull request

This is a pull request in django 3.2 that makes quersets awaitable. The main queriset interface is an iterator that returns objects. Alas, now it will become less self-sufficient: it can return objects to us if we pass it the rows that we received from the database. For example, for this we may have a method .send_rows(rows)which will be an iterator and return objects. If we receive strings from the database in chunks, the method will need to be called many times. You may notice that the queriset interface in this case is similar to the generator interface: it also has a send method. But no, we do not have a generator, but the most ordinary object.

Basically, the changes concern django.db.models.query.ModelIterable and others like him.

You can also look at the code of the drivers, synchronous and asynchronous, I put them in the module driver.py

This is how the __iter__ and __await__ interfaces now look:

def __iter__(self):
    yield from driver.execute(self.queryset)

async def _await(self):
    return await async_driver.execute(self)

# __await__ - обёртка вокруг метода выше

Well, I can’t help but write about prefetch_related… There, requests are executed one after another, respectively, and you need to interact with the driver for each such request. Here I decided to use real generators to keep my code changes to a minimum. The result exceeded expectations: it was only necessary to replace in a few places return on yield from… For example, the main method _fetch_all() looks like So… Sometimes generators make life a lot easier.

Asynchronous API

Above I wrote that django will remain completely the same, only it will become asynchronous – this is the case, in general. But some may argue that this is not possible due to syntactic peculiarities. For example, when accessing the database, it is imperative to write await, because this is an asynchronous operation, and django has so-called lazy attributes (usually these are related entities, via a foreign key, or obtained in advance via prefetch_related). Yes, it is, and you really can’t do without some API change.

But there is a simple solution for this: let all requests to the database, or their absence, be explicit. For example, by default, any access to the attributes should not lead to queries to the database: they must be retrieved in advance (for example, using select_related or prefetch_related). If we access an entity linked through a foreign key and want to make a request to the database for this, then we cannot use the same API

obj.related_obj  # Exception: not in the cache

The object was not requested beforehand, and to request it again requires an asynchronous call, which attribute access is not. But we can change the API a little for such cases. For example, use the letter R (R – for relation)

await obj.R("related_obj")
await obj.R("related_objects").all()

Or you can use a different letter. Or another API like R.related_obj… In general, something else). Well, if we use the old API (obj.related_obj), this means that the object is already in the cache due to prefetch_related or something similar.

By the way, I must say: I believe that such a feature would be reasonable in a synchronous context: usually the developer always understands whether the data should be taken from the cache, or whether a request should be made to the database. And if the developer believes that the data will be taken from the cache, but in fact it will not be there, then the execution in this case would be the most suitable. But – as it is done, so it is done, compatibility is above all.

Asynchronous API Part 2: Models

The above was not the only problem that may arise, but the main one, in my opinion. Here, for comparison, is another one.

Its essence is as follows: we are making an asynchronous version so that we can write asynchronous services. Most likely, they will need to access the same data (relational tables) as synchronous services. In this case, we would not want to duplicate the declaration of models.

I will try to describe one of the possible solutions. If you noticed, in django, the entire interface is hung on the model: one way or another, we do everything either through the model class (MyModel.objects), or through an instance. The solution that suggests itself: let’s make an asynchronous model class – for use in an asynchronous context, while the synchronous and asynchronous classes will correspond to the same table.

As for the fields in these model classes (those that are responsible for the database schema) – of course, they must be the same, and there is no point in defining them in 2 different places. But this is such a problem that will definitely find a solution: probably every developer will offer a way to avoid copy-paste when declaring 2 classes, if they must have the same fields.

But there is probably no point in reusing any methods between these classes (synchronous and asynchronous). They shouldn’t inherit each other either.

In one class, the method save will be synchronous, otherwise asynchronous:

async def save(self, **keywords)

This concludes the description. Most likely, we will have 2 base model classes – synchronous and asynchronous. A synchronous class cannot be used in an asynchronous context, and vice versa (everything will crash very quickly with execution).

Timeline

The asynchronous version appears first. Right in alpha: there will be bugs, but the API will be rock-stable. The version will move towards beta and release.

A synchronous version will come out at some point. Some requirements for quality and compatibility are already needed here. In addition, no one and nothing is rushing to release it: there is traditional django, use it. In the more or less distant future, the synchronous and asynchronous versions are provided (and evolved) at the same time.

As for the timeline itself, the release of the first version can be estimated in a few months.

And some more details: of course, the project will be a separate package in PyPI with a different name (not django). I haven’t decided yet: maybe “remake”? It will have a very similar structure, that is, one of the typical imports would be `remake.db.models`. In my opinion, it does not look very good, you need something more successful.

And so many letters came out, I’m waiting for your reaction, gentlemen.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *