Common pitfalls of Python developers in interviews

Hello everyone, today I would like to talk about some of the difficulties and misconceptions that many job seekers face. Our company is actively growing and I often conduct or participate in interviews. As a result, I identified several issues that put many candidates in a difficult position. Let’s look at them together. I’ll cover Python-specific questions, but overall this article will work for any job interview. For experienced developers, no truths will be revealed here, but for those who are just starting their journey, it will be easier to decide on the topics for the next few days.

The difference between processes and threads in Linux

Well, you know, such a typical and, in general, simple question, purely for understanding, without digging into details and subtleties. Of course, most applicants will tell you that threads are lighter weight, context switches between them faster, and in general they live inside the process. And all this is correct and wonderful when we are not talking about Linux. In the Linux kernel, threads are implemented in the same way as normal processes. A thread is simply a process that shares some resources with other processes.

There are two system calls that can be used to create processes in Linux:

  • clone()… This is the main function for creating child processes. The developer uses flags to indicate which structures of the parent process should be shared with the child. Basically it is used to create streams (have a common address space, file descriptors, signal handlers).
  • fork()… This function is used to create processes (which have their own address space), but under the hood calls clone() with a specific set of flags.

I would pay attention to the following: when you do fork() process, you will not immediately get a copy of the parent’s memory. Your processes will run with a single in-memory instance. Therefore, if in total you should have had a memory overflow, then everything will continue to work. The kernel will mark the memory page descriptors of the parent process as read-only, and an attempt to write to them (by the child or parent process) will raise and handle an exception that will cause a full copy to be created. This mechanism is called Copy-on-Write.

I think Linux is a great book on Linux devices. System Programming “by Robert Love.

Event Loop issues

Asynchronous services and workers in Python or Go are ubiquitous in our company. Therefore, we consider it important to have a common understanding of asynchrony and how Event Loop works. Many candidates are already pretty good at answering questions about the advantages of the asynchronous approach and correctly represent the Event Loop as a kind of endless loop that allows you to understand whether a certain event has come from the operating system (for example, writing data to a socket). But the glue is missing: how does the program get this information from the operating system?

Of course, the simplest thing to remember is Select… With its help, a list of file descriptors that you plan to monitor is formed. The client code will have to check all passed handles for events (and their number is limited to 1024), which makes it slow and inconvenient.

Answer about Select more than enough, but if you think about Poll or Epoll, and tell about the problems that they solve, this will be a big plus for your answer. In order not to cause unnecessary worries: we are not asked for C code and detailed specification, we are talking only about a basic understanding of what is happening. Read about differences Select, Poll and Epoll can be in this article.

I also advise you to look at the topic of asynchrony in Python by David Beasley

The GIL protects, but not you

Another common misconception is that the GIL was designed to protect developers from concurrent data access issues. But this is not the case. The GIL will, of course, prevent you from parallelizing your program with threads (but not processes). In simple terms, the GIL is a lock that must be taken before any call to Python (not so important. Python code is executed or Python C API calls). Therefore, the GIL will protect internal structures from inconsistent states, but you, as in any other language, will have to use synchronization primitives.

They also say that the GIL is only needed for the GC to work correctly. For her, he, of course, is needed, but this is not the end of it.

From the point of view of execution, even the simplest function will be broken down into several steps:

import dis

def sum_2(a, b):
    return a + b

dis.dis(sum_2)


4           0 LOAD_FAST                0 (a)
             2 LOAD_FAST                1 (b)
             4 BINARY_ADD
             6 RETURN_VALUE

From the processor’s point of view, each of these operations is not atomic. Python will execute a lot of processor instructions for each line of bytecode. At the same time, you must not allow other threads to change the state of the stack or make any other memory modification, this will lead to a Segmentation Fault or incorrect behavior. Therefore, the interpreter requests a global lock on each bytecode instruction. However, the context can be changed between individual instructions, and here the GIL does not save us in any way. You can read more about bytecode and how to work with it in documentation

On the topic of GIL security, see a simple example:

import threading

a = 0
def x():
    global a
    for i in range(100000):
        a += 1

threads = []

for j in range(10):
    thread = threading.Thread(target=x)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

assert a == 1000000

On my machine, the error crashes stably. If suddenly it does not work for you, then run it several times or add threads. With a small number of threads, you will get a floating problem (the error appears and does not appear). That is, in addition to incorrect data, such situations also have a problem in the form of its floating nature. This also brings us to the next problem: synchronization primitives.

And again I cannot help but refer to David Beasley

Synchronization primitives

In general, synchronization primitives are not the best question for Python, but they show a general understanding of the problem and how deeply you dug in this direction. The topic of multithreading, at least with us, is asked as a bonus, and will only be a plus (if you answer). But it’s okay if you haven’t encountered it yet. We can say that this question is not tied to a specific language.

Many novice pythonists, as I wrote above, hope for the miraculous power of the GIL, so they don’t look into the topic of synchronization primitives. But in vain, it can come in handy when performing background operations and tasks. The topic of synchronization primitives is large and well understood, in particular, I recommend reading about it in the book “Core Python Applications Programming” by Wesley J. Chun.

And since we have already looked at an example where the GIL did not help us in working with threads, we will consider the simplest example of how to protect ourselves from such a problem.

import threading
lock = threading.Lock()

a = 0
def x():
    global a
    lock.acquire()
    try:
        for i in range(100000):
            a += 1
    finally:
        lock.release()

threads = []

for j in range(10):
    thread = threading.Thread(target=x)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

assert a == 1000000

Retry all over the head

You can never rely on the fact that the infrastructure will always work stably. In interviews, we often ask to design a simple microservice that interacts with others (for example, over HTTP). The issue of service stability sometimes confuses candidates. I would like to point out a few issues that candidates overlook when proposing retrying over HTTP.

The first problem: the service may simply not work for a long time. Repeated requests in real time will be pointless.

Roughly done Retry can finish off a service that has started to slow down under load. The least he needs is an increase in the load, which can grow significantly due to repeated requests. It is always interesting for us to discuss methods of saving state and implementing dispatch after the service starts working normally.

Alternatively, you can try to change the protocol from HTTP to something with guaranteed delivery (AMQP, etc.).

The service mesh can also take over the retry task. You can read more in this article.

Overall, as I said, there are no surprises here, but this article can help you figure out which topics to pull up. Not only for interviews, but also for a deeper understanding of the essence of the ongoing processes.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *