Have you ever looked at goroutines this way?

Behind the scenes of Go runtime

Behind the scenes of Go runtime

Don't worry about understanding the image above right now, we'll start with the very basics.

Goroutines are distributed across threads, which are managed by the Go Scheduler under the hood. We know the following about goroutines:

  • Goroutines are not necessarily faster than threads in terms of execution speed, since they require threads to execute them.

  • The key advantage of goroutines is in such nuances as the switching context, the amount of memory they occupy and the cost of creation and “deletion”.

You've probably heard of Go's scheduler before, but how much do we know about how it works? How does it connect goroutines to threads?

Let's look at the operations that the scheduler performs one by one.

1. M:N Scheduler

The Go team has made it easier for us to compete. Just think about how all you need to do to create a goroutine is to add a prefix go to the function declaration.

go doWork()

But behind this simple step there is a much more complex system.

Go simply doesn't give us access to threads. Threads are managed by a scheduler, which is a key part of Go's runtime.

Go Scheduler

Go Scheduler

So what about M:N?

This means that the role of the Go scheduler is to bind M goroutines to N kernel threads, forming an M:N model. You can either have more OS threads than kernels or more goroutines than threads.

Before we dive deeper into how the scheduler works, let's clarify two terms that are often used together: concurrency and concurrency.

  • Competitiveness: This is about having several tasks executed simultaneously, but not necessarily at the same point in time.

  • Parallelism: This means that multiple tasks are running at the same time, most likely using more than one CPU core.

Competitiveness vs.  Parallelism

Competitiveness vs. Parallelism

Let's look at how the Go scheduler manages threads.

2. PMG model

Before we unravel the inner workings, let's understand what P, M and G are.

G (goroutine)

A goroutine is the smallest execution unit in Go and operates like a lightweight thread.

Together with Go runtime, a goroutine is a structure g. As soon as it is declared, it finds its place in the local execution queue of the logical processor P. P, in turn, passes it on to the OS thread (M) for execution.

A goroutine can be in three (main) states:

  • Waiting: In this state, the goroutine is idle. For example, it pauses for operations with channels or locks, or can be stopped by a system call.

  • Runnable: The goroutine is ready to be executed, but is not yet executed. She is waiting her turn on stream (M).

  • Running: The goroutine is executed on a thread (M). This will continue until the work is done, or until it is interrupted by the scheduler, or something else blocks it.

Goroutine states

Goroutine states

Goroutines NOT used once and then thrown away.

When a new goroutine is initiated, the Go scheduler goes to the pool of goroutines to pick one up, and if there are none, it creates a new one. This new goroutine is added to the processor's execution queue (P).

P (Logical Processor)

With the Go scheduler, when we say “processor,” we mean a logical entity, not a physical processor.

However, by default the number of P is set to the number of available cores on the host, you can change this value using runtime.GOMAXPROCS(int).

runtime.GOMAXPROCS(0) // получить доступное количество логических ядер на хосте
// Output: 8 (зависит от процессора)

If you plan to change this value, it is best to do it once at the start of the application. If you change it in runtime, it will lead to STW (stopTheWorld), the entire application will pause while the number of processors is changed.

Each P contains its own list of runnable goroutines, called a Local Run Queue, which can be up to 256 goroutines in size.

Scheduler -> Logical Processor P” title=”Scheduler -> Logical Processor P” width=”815″ height=”456″ data-src=”https://habrastorage.org/getpro/habr/upload_files/cbe/284/77e/cbe28477edd54bd0303a6dadeafcb366.png”/></p><p><figcaption>Scheduler -> Logical Processor P</figcaption></p></figure><p>If queue P reaches its maximum number (256) and overflows, there is a global Run Queue, but we'll come back to that in a moment.</p><blockquote><p>“So what does the P number tell us?”</p></blockquote><p>The number P tells us how many goroutines can be executed concurrently.</p><h4>M (OS Thread)</h4><p>A typical Go application can use up to 10,000 threads.</p><p>And yes, we are talking about threads, not goroutines.  If you go beyond this limit, there is a risk that the application will crash.</p><blockquote><p>“When is a thread created?”</p></blockquote><p>Think about this situation: a goroutine is in runnable state and requires a thread.</p><p>What happens if all threads are already blocked, perhaps by system calls or non-preemptible operations?  In this case, the scheduler comes into play and creates a new thread for this goroutine.</p><p><em>(It should be noted that if a thread is simply busy with a complex calculation or task with a long execution time, this does not mean that it will be considered blocked)</em></p><p>If you want to change the default flow limit, you can use the function <code>runtime/debug.SetMaxThreads()</code> .  It will allow you to set the maximum number of OS threads that the application can use.</p><p>Also, it is worth remembering that threads are reused, since creating and deleting a thread is resource-intensive operations.</p><h3>3. How MPG works</h3><p>Let's understand step by step how M, P and G work together.</p><p>I won't go into all the little details, but in general terms it works like this:</p><figure class=How the Go scheduler works

How the Go scheduler works

  1. Initiating a goroutine: using go func()Go runtime creates a new goroutine, or uses an existing one from the pool.

  2. Queue positioning: The goroutine looks for space in the local queues of logical processors (P), and if they are all full, it is placed in the global queue.

  3. Linking to a stream: in this step thread (M) comes into play. The thread takes P and starts executing goroutines from its local queue. As soon as a thread starts executing a goroutine, the processor (P) that had the goroutine in its queue is associated with that thread (M) and becomes unavailable to other threads.

  4. “Work stealing”: if a processor's (P) local queue is empty, thread M tries to borrow half of the runnable goroutines from another processor's (P) local queue. If nothing is found, thread (M) checks the global queue and then Net Poller (there is a process diagram below work stealing).

  5. Resource allocation: After thread M selects goroutine G, it provides it with all the necessary resources.

“What about blocked streams?”

If the goroutine makes a system call that will take a long time (such as reading a file), thread M will wait.

But the planner is not happy with those who just sit and wait. It unbinds a busy thread M from its processor P, and associates another runnable goroutine from P's queue to a new or existing free thread M that is associated with that processor.

Blocked Threads

Blocked Threads

Work stealing process

When thread M has completed all its tasks and has nothing else to do, this does not mean that it will be idle.

The thread will look for goroutines in the local queues of other logical processors and take half of their goroutines:

Borrowing process

Borrowing process

  1. Every 61 ticks, thread M checks the global queue, and if there is a runnable goroutine in it, work stealing stops.

  2. Thread M checks for runnable goroutines in the local queue associated with its processor P.

  3. If the thread doesn't find anything in the local queue, it checks the global queue again.

  4. The thread checks the network poller for any waiting goroutines working on the network.

  5. If the thread still has not found any tasks after checking the network poller, it will go into active cyclic search mode.

  6. In this state, the thread will try to borrow goroutines from the local queues of other processors.

  7. After all these steps, if the thread still has not found a job, it stops actively searching.

  8. Now, if new goroutines appear and there is a free processor without a thread associated with it, another thread can be attracted to work.

Please note that the global queue is checked twice: once every 61 ticks and if the local queue is empty.

“If M is associated with his P, how can he take goroutines from another processor? Is a thread changing from one processor to another?”

The answer is no.

Even if M then takes goroutines from an unrelated processor P, it will execute these goroutines using their processor. Thus, while the thread is executing other people's goroutines, it does not lose contact with its own processor.

“Why 61?”

When developing algorithms, and especially hashing algorithms, it is a practice to use prime numbers that are divisible only by 1 and themselves.

This can reduce the chance of patterns emerging that ultimately lead to collisions or other unwanted behavior.

If you check more often, it may take too many resources; if you check less often, the goroutines may wait too long for execution.

Network Poller

We did not cover the topic of network poller, but we mentioned it in the work stealing diagram.

Like the Go scheduler, the Network Poller is a Go runtime component and is used to execute network-related requests, such as network I/O operations.

There are 2 types of system calls:

  • Network related system calls: When a goroutine performs a network I/O operation, instead of blocking the thread, it is added to the network poller. Netpoller waits for the operation to be executed asynchronously, and when it waits, the goroutine becomes runnable again and its execution becomes available to the thread.

  • Other system calls: if they are potentially blocking and are not executed by the network poller, the goroutine will completely occupy the OS thread, this thread will be blocked and Go runtime will execute the remaining goroutines on other free threads.

approx. lane Network poller is a Go runtime unit that implements multiplexing of network I/O operations. For example, in Linux this concept is implemented by the epoll tool.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *