Introduction to Optimizing DB Queries on Django with Silk

Is it worth using django in 2024? I think so. DRF is very convenient, the development speed is very high (especially if you use generic views, django-filters), a huge number of ready-made batteries makes life much easier and the built-in admin panel is well suited for most sites. Fully asynchronous frameworks (or switching to another language) will not give much benefit if your service works a lot in the DB – you will run into its performance and the limitations of the number of connections to the DB pool. Below I will briefly go over the main points and give links to documentation and ready-made batteries.


Django projects may start to slow down as the load increases. As a rule, this is due to the increase in load on the database and there may be several problems here:

N+1

When working with related tables, you must explicitly specify them when composing a query using select_related for foreign key and prefect_related for many to many. Otherwise, to load one page you will have tens or hundreds of requests. To understand that this is happening, you can use silk. This library logs all requests that occur when loading a page, you can see how many there were, how long they took to execute, and the plan (explain) for each request. If there are a lot of requests for a page, you most likely forgot to do select/prefetch. Here is an example of a very sad situation:

And here are the same pages after adding select/prefetch related:

If you need to additionally filter many to many tables, you need to use prefech objects. For example, you want to do something like this:

book = Book.objects.get(pk=1)

shops = book.shop_set.filter(city_id=1)

This can be done like this:

book = Book.objects.prefetch_related(Prefetch("shop_set", queryset=Shop.objects.filter(city_id=1), to_attr="city_shop_set")).get(pk=1)

shops = book.city_shop_set.all()

Indexes

In the SQL tab you can see which queries took a long time to execute and their plan. It may be necessary create some indexes. (please note if there is full scan)

You can also explicitly specify which columns you want using only/defer so as not to drive unnecessary data.

Some queries may take a long time to execute due to some aggregations, such as slow counts. You can cache them or store their values ​​in the table itself and update them periodically or by signals. Or some denormalization is required to avoid subquery.

silk also allows profile python code of your views:

Replication

Django supports replication bd. As the load increases, it makes sense to direct select queries, which are usually much more numerous, to read-only replicas. If your project has a large geography, you can also use sharding or tenants. There are also ready-made ones for django batteries.

Recursion

Recursive queries to the database can be written using common table expressions. Or, if these are some trees to use MPTT

Asynchronous view

Usually some requests to external services are made inside celery, but modern versions of django support asynchronous view. It is worth paying attention to them. For asynchronous http requests, you can use aiohttp or httpx inside such views.

Serializers

Try to avoid to_representation and SerializerMethodField and especially making any queries inside it.

Caching

Usually, you can safely cache many pages, at least for a short time, do not forget to invalidate the cache and when caching a page, pay attention to cookies or auth headers if it is different for different users. Database queries can also be cached, including automatically.

graphql

Instead of making lots of rest requests per page to get all the nested data, you can switch to graphql and query it all at once.

And finally, some things are not done very well in relational DBs. For example, if you need faceted search, you should pay attention to elastic/open search

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *