Guido van Rossum. Reflections on the asyncio.Semaphore class

There is one very peculiar fast food restaurant in Silicon Valley. It works around the clock and accepts one guest at a time, because there is only one table. But the hamburgers are amazing. When you arrive, you wait until the table is free. Then you are invited, and because this is America, you are asked an endless plethora of questions about how to cook and serve a hamburger.

But let’s not talk about culinary delights today. In such restaurants, we are more interested in the queuing system. If you are lucky enough to come to a restaurant when the table is free and there is no queue, you will immediately be seated at a table. Otherwise, you will be handed a buzzer (they have a great variety of such “tweeters”!). With such a buzzer, you can safely wander around the area until he gives a signal. The one who serves the guests makes sure that this is done in order of arrival. When your turn comes, he will send you a signal, you will return to the restaurant, and a place will be found for you. Continuation – to the start of our course on Fullstack Python Development.

If by that time you have already changed your mind about ordering, the buzzer can be returned. This is a common thing: they will take him away without blinking an eye. Even if the signal has already worked, the next guest will be invited to the table. Unless, of course, you’re the only one in line. A polite guest will not take the buzzer far, and an honest manager will not seat other people at a table out of turn, even if you are late.

All this is a description of the work of the class lock. Guest arrival corresponds to method call acquire()and its departure – call the method release(). To cancel your order means to cancel acquire() in the process of waiting. This can be done before or after the beep sounds, that is, the operations are canceled before or after the call is activated from the Lock side (but before the return from the method acquire()).

The restaurant business can be expanded: hire more chefs and put more tables. But the manager will remain alone, and the essence of his work will not change. But since guests can now be hosted at the same time, instead of just blocking with a class lock now you have to use the class Semaphore.

It turns out that implementing the synchronization primitives is very tricky. In the case of a library asyncio this is somewhat surprising, because only one task can be performed at a time, and task switching occurs only in await. Since last year fairness of distribution resources (equity), and correctness, semantics and performance of this method are considered very critically. The last three complaints were received only last month. To me, the last man standing on the battlefield with marks эксперту по asynciohad to hastily comprehend the idea of ​​semaphores.

Comparison with the restaurant was very helpful. For example, it shows the difference between the number of free tables and the number of guests who can immediately find a table. This difference is equal to the number of people who still returned the buzzers to the manager.

There is one problem with fairness. If the task asks releaseand then immediately tries to query acquire, it hangs other tasks. As if a departing guest suddenly turns around and sits down at the table again, ahead of people in line.

There is also such a bug in the library – saving the wrong state Lock when canceling a call acquire(). It was as if the guest returned the buzzer that went off, but refused to sit down at the table, which put the manager into a stupor.

Comparison with a restaurant does not help everywhere: with the sequence of events when canceling at asyncio it’s Complicated. In Python 3.11, we have additionally loaded the cancel operation with two new asynchronous context managers:

  • Class task group manages a group of related tasks. If one of these tasks fails, the others are canceled and the context manager waits for all tasks to exit.
  • Function timeout() sets the wait time. After this time, the task is cancelled.

And here is the main difficulty of cancellation:

  • Futura (class object [Future](https://docs.python.org/3.12/library/asyncio-future.html#asyncio.Future)) can be canceled while waiting. Then the operation await crashes and raises an exception CanceledError.
  • But if future expectation returns CancelledError, it is impossible to believe that the future itself is canceled! In this case, the future can already be marked as having a result (which no longer allows it to be canceled), and the task can be marked as ready for execution (runnable), however, another task (also ready for execution) will run first and cancel this one. Thanks to user Cyker Way for pointing out this edge case.

It is useful to imagine that a Future can have 4 states:

  • expectation;
  • ready, there is a result;
  • done, there is an exception;
  • done but cancelled.

From waiting state Future can go to any other state, after which the state remains unchanged. (insert some pretty state diagram here :-)).

Semaphore (Semaphore) serves pending tasks in first-in-first-out (FIFO) basis. It does not have an exception state, but it does have three other states:

  • waiting: a guest with a buzzer that has not yet worked;
  • there is a result: a signal came to the guest’s buzzer;
  • Cancellation: The guest turned off the buzzer before receiving the signal.

For the sake of fairness, the function acquire() must add a new Future object to the end of the queue for each semaphore lock it encounters. In this case, the leftmost (oldest) future should be marked as containing the result when calling the function release()until the queue is empty. A fairness error occurs if acquire() chooses the shortest path for non-zero level semaphore (number of free tables). This should not be done when there are futures in the queue. In other words, we sometimes mistakenly seated new arrivals at empty tables while others waited in line.

What do you think is the cause of the undo bug? I mean a scenario where the future has a ready result (the buzzer went off), but the task waiting for that future is canceled (the guest refused to sit down).

I had a hard time imagining the state of a semaphore with a variable level and a queue of pending FIFO futures. It was not easy to define the function locked(). If level was public, you would have to tinker with its semantics. In the end, I came up with the following definitions:

  • W is a list of pending futures: [f для f в очереди queue, у которых не вызвана f.done()]
  • R is a list of ready-made futures that have the result: [f для f в queue, у которых вызвана f.done(), но не f.cancelled()]
  • C – list of canceled futures: [f для f в queue, у которых вызвана f.cancelled()]

And here are some invariants based on them:

  • set(W + R + C) == set(queue) – all pending, ready and canceled futures.
  • level >= len(R) – we should have no less free tables than there are buzzers in the hands of the guests.
  • definition locked() how (len(W) > 0 or len(R) > 0 or level == 0) – in order to immediately seat a guest at the table, it is necessary that there are no guests with buzzers who are waiting for a signal; there were no guests with sounded buzzers; at least one table was free.

At the end of the article I give you a link to current implementation code such a semaphore.

And we will teach you how to work with Python so that you can upgrade your career or become a sought-after IT specialist:

To view all courses, click on the banner:

Brief catalog of courses

Data Science and Machine Learning

Python, web development

Mobile development

Java and C#

From basics to depth

As well as

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *