Synchronizing applications using Advisory Locks (postgresql). What is it, why, and the nuances of working with pgBouncer


In today’s world, a single back-end application usually has more than one instance running, if only for reliability reasons. And this means that in order to synchronize their actions, you need to come up with something, some external solution: mutexes or, for example, internal channels inside the application are no longer enough.

Fortunately, in many cases the project already has some kind of database that can be used for this purpose. The DBMS itself manages the locks, and many problems are solved by themselves, “under the hood”. For example, if two instances try to update the same row in a table, that row will not turn into a mess. The DBMS will automatically take the required lock, and the one who came second will simply wait until this lock is removed.

The problem is that such automation with forced locks is not suitable for all cases. For example, you process files in bulk, assuming that no file will be processed simultaneously by two applications at once, but at the same time you do not want to create a full-fledged table in the database for synchronization. In Karuna’s projects, such tasks arise quite regularly.

To solve such problems, PostgreSQL has so-called optional locks (advisory locks), i.e. locks that are taken based on the application logic, and not on the automatic storage / issuance of data in the database.

Session locks

I must say right away that by session here we mean a tcp connection to PostgreSQL, not to pgBouncer, this is important. I’ll talk about this in more detail below.

So, session advisory locks. If two programs in different sessions simultaneously do

-- Берём лок, используя некое число (BigInt)
SELECT pg_advisory_lock(123);

then the first will continue to work, and the second will hang, waiting for the first to release the lock using the command

-- Снимаем лок (с тем же числом)
SELECT pg_advisory_unlock(123);

What is that number in brackets? Well, it’s any number you like – the main thing is that in the same situation, two programs use the same thing. In fact, in practice, you often want to set a lock not on a number, but on a certain string. For example, when processing files, this might be the name of a file. But the function requires exactly bigint, so they often use such a creepy-looking method: the string is converted to md5, and then md5 is represented as a hexadecimal number that is cast to bigint. As a result, you can lock on a very specific file name, given as a string

SELECT pg_advisory_lock(
    ('x' || md5('/path/to/my/file.txt'))::bit(64)::bigint  
);

There is another useful feature. Let’s say you need to do some heavy operation (for example, an analytical report once a day at 00:00) exactly once. Then it makes no sense for you to suspend the second instance by waiting. You just need to check: if there is no lock, then take it, and if it already exists, then just exit. For this, the function is usually used pg_try_advisory_lock

SELECT pg_try_advisory_lock(
    ('x' || md5('create complicated report'))::bit(64)::bigint
);

This function will return false if the lock is already taken in another session. And if it is not taken, then it takes a lock and returns true. It shoots exactly the samepg_advisory_unlock)

Sessions and pgbouncer

If you are new to Postgresql, then it would be better for you to immediately learn a very bad thing about Postgres: for each connection, it creates a real OS process. Therefore, in any more or less loaded system, one or another connection pooler is used. One of the most common options is pgBouncer. PgBouncer keeps several persistent connections (sessions) to the DBMS, and the client connected to the pooler receives one of them, then another – one that is no longer used by anyone.

PgBouncer can work in two modes: session and transactional. Sessional the mode is logically almost the same as a direct connection to the database, since while the application is connected to pgBouncer, pgBouncer provides it with the same connection to Postgres. And when disconnected, the connection is returned to the pool.

AT transactional In the same pgbouncer mode, a different session will be slipped into each transaction.

Those. you do a SELECT – you get connection number 1.
Then you do BEGIN; multiple requests; COMMIT; – get a connection, let’s say 15.
Then you make another select – you get connection number 12, or maybe number 1 – as you’re lucky. What is free at the moment is what you will get.

Transactional mode is very efficient and is used in highload because connections are not idle, but are immediately returned to the pool.

However, it is not possible to work with pg_advisory_lock() in this mode.: you just don’t know which connection your request will fall on – the same or another. But the lock only works on different sessions (connections). Moreover, in a dev environment, when testing functionality, this can accidentally work correctly. But when rolling out to production and a real load, another probability will work, and very strange, poorly diagnosed, buggy behavior will appear.

Transactional locks

However, advisory locks also come in another form – for working with transactions (that is, they are suitable for working with pgbouncer in any mode). They are taken inside the transaction with pg_advisory_xact_lock()/pg_try_advisory_xact_lock()and released automatically when a transaction is committed or rolled back.

Example:

Starting a transaction

BEGIN;

Trying to take a lock

SELECT pg_try_advisory_xact_lock(
    ('x' || md5('create daily 00:00 report'))::bit(64)::bigint
);

If this request returns false, then we complete the transaction and do nothing else. If it returns true, then we do something useful, for example, a request for a long-computed daily report for 10 joins and 100500 conditions, which we would not like to execute twice at the same time.

SELECT ...
  JOIN ...
  JOIN ...
  ....
WHERE
   ...

We release the transactional advisory lock by simply completing the transaction:

COMMIT;

Such *_advisory_xact_lock work well with pgBouncer running in transactional mode, but here, too, one should not forget about one nuance.

You should not use such a lock, for example, for hourly video processing, if there are a lot of processors: all this time the transaction will hang, which means that the connection to the database will also. In this case, several of these vids will simply gobble up all connections from pgBouncer to the database, and everything else will freeze waiting for connections.

More nuances

Official documentation:

general description

feature list

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *