writing an asynchronous web client

What prompted me to write this material was… the lack of good articles on coroutines in C++ on the Russian-language Internet, no matter how strange it may sound. Well, seriously, C++20 has been around for several years now, but still almost all articles about coroutines that can be found on the RuNet belong to one of two types. Or the review begins with the very depths and details, retelling cppreference, and then the author runs out of steam and it all comes down to “well, then everything is clear, take it and apply it in your code,” which is reminiscent of the famous picture with an owl:

Or sometimes articles discuss the use of coroutines using generators as an example, and that’s all there is to it. But let's be honest, generators are great, but in my many years as a developer I've probably only done something like generators once, whereas asynchronous I/O has to be used in almost every project. And therefore, I am much more interested in implementing asynchronous I/O using coroutines rather than generators. So I had to figure it out myself.

Synchronicity and asynchrony

(If you already know all this well, you can skip and scroll below)

When people talk about “asynchrony,” they usually mean asynchronous I/O. This could be writing or reading operations from a file system, exchanging data with external devices (for example, over a serial port), or sending and receiving information over the network, such as requests from clients, calls to microservices or database servers. What is the difference between synchronous and asynchronous I/O? In the synchronous approach, when we call, for example, the read() method to receive data from a network socket, the execution of the program (thread) blocks until the data is received or a timeout occurs. And with the asynchronous approach, it's more like creating a job and asking it to tell you when it will be completed: asynchronous calls don't block the thread, allowing it to do other things.

Why is it important? Let's imagine you are writing a very simple network service: a client connects to you, sends you a request, you redirect this request to the database server, receive a response from it, and give it to the client. In a synchronous style, the algorithm might look something like this (pseudocode):

while (!terminated)
{
  var client = server.accept(); // принимаем входящее подключение
  var request = socket.read(...));
  var result = db_сlient.do_query(transform_request(request));
  socket.write(transform_result(result));
  socket.close();
}

It would seem that everything is fine, the code is simple, very logical and easy to read. And it will even work quite well. But what happens if you have more clients and requests come more and more often? Or what if two clients connect to you at the same time? In this case, clients will have to “wait in line” while requests from previous clients are processed. Moreover, it may turn out that some client, which you could deal with much faster even without going to the database (for example, by retrieving information from the cache), will also have to wait for the processing of previous requests from other clients.

To improve the situation, you can process each incoming connection in a separate thread. This will help deal with the problem for a while, allowing you to handle dozens and perhaps even hundreds of simultaneous connections. However, when the number of requests becomes very large – thousands and tens of thousands – your server will sooner or later choke. Because the number of processor cores is limited, the operating system must constantly switch context between these hundreds and thousands of threads, which is an expensive and slow operation. In addition, each thread will need to allocate memory for its stack. And you will have thousands of threads, which most of the time will still be idle, waiting for processing.

And this is where asynchronous technology comes to the rescue. Now we don’t have to do read() in each thread and wait. Instead, we can have one (or several, depending on the number of processor cores) thread, in which we will cycle call the operating system API function and tell it: “Here is a list of file descriptors, for example, sockets. As soon as in any of them if something happens, for example, some data is received, tell me immediately.” Such functions could be select() or poll(), or more efficient ones from the epoll family, or the newest mechanism called io_uring (this is about Linux, other systems have similar mechanisms). And then, depending on which descriptor and what exactly happened, we will quickly process this event and wait for the next one. Thus, with just one or a few threads we can efficiently process thousands and tens of thousands of connections and events.

But here a problem arises – the structuring and readability of the code. If with a synchronous approach everything is extremely simple and clear – they took the data from here, put it here, and so on, then in the case of asynchronous approach everything becomes much more complicated and confusing – you sent a request from one place, and are processing the response in a completely different place.

To improve the situation a little, you can use the callback mechanism. To do this, you can add some wrappers and abstractions over the event loop. For example, when calling the read_async() function, you can pass a callback to it as one of the arguments. In C this can simply be a function pointer, and in C++ it can be a std::function, which allows the use of std::bind and lambdas. This is better, but still more complex and produces less readable code than the synchronous approach. Especially if the logic and algorithms become more complex, there is a risk of falling into the so-called “callback hell”, when one callback calls a second, and that a third, and so on, as a result of which the code becomes complex and confusing due to the smearing of logic across many small functions or heaps of nested lambdas. In addition, there is a risk of forgetting to call something on some occasion due to inattention.

And this is where coroutines come to our aid. Essentially, they combine the strengths of both approaches – we write code that looks synchronous, but in fact it runs asynchronously. Let's look again at the artificial example above pseudocode):

var client = await server.accept(); // принимаем входящее подключение
var request = await socket.read(...));
var result = await db_сient.do_query(transform_request(request));
await socket.write(transform_result(result));
socket.close();

The compiler will do its special magic. Roughly speaking, from this one function he will make several at once. By calling server.accept() from the first line, the thread of execution will be released, and everything that follows will become a callback (code that will be executed when we have the result of the accept operation in our hands). When the callback is called, the second read() instruction will be executed, then execution will be released again, and subsequent instructions will be the callback. And so on.

In fact, the compiler will not generate many different functions, but will generate one, but with a state machine (finite state machine) inside, and accordingly, when calling our function, the execution point will jump to different places in the code, depending on what stage we are at we are. It is called Duff's Device, in honor of Tom Duff, who created this mechanism in the early 1980s to optimize copying of data arrays of different lengths on weak hardware. Those optimizations are no longer needed, but almost the same mechanism is used to create finite state machines when compiling coroutines.

One important question remains: what about the stack? In our function, we define local variables, pass arguments to it when we call it, and when we exit the function, the stack frame can (and will) be overwritten by another function call, which means data can be lost between calls. What to do?

Simon Tatham, author of the legendary PuTTy utility in his article about implementing coroutines in Cand Adam Dunkels, author of the C library protothreads (by the way, it’s very cool!), offer a radical solution to this problem – to abandon the use of local variables. Instead, you can use static variables, global variables, or allocate memory on the heap, but you cannot use local variables, no, no, because there is no guarantee of data safety between calls.

Naturally, we won’t go crazy like that; in the modern world there are two much more convenient ways not to lose context when transitioning between states: stackful and stackless coroutines.

Idea stackfulCoroutine is quite simple. The stack is a memory area whose address is stored in a processor register and is incremented/decremented when functions are called or exited. Technically, before calling a function, we can allocate a block of memory and change the stack pointer in the processor register to point to the new block. When exiting the function, we restore the original pointer to the stack, and the next time we run the function, we again change the pointer to that memory block. An example of a stackful coroutine implementation is the well-known library Boost.Coroutine2. Although this approach is not very cross-platform due to differences in stack pointer registers on different architectures, requires low-level solutions, and is not very convenient (coroutines are launched by macros), it has the advantage that using such coroutines does not require compiler support, and you can write code on coroutines even using ancient versions of C++ (for example, C++11).

Stackless Coroutines, as the name suggests, are different in that they do not use a stack. Instead, the compiler parses the function, and, very simply put, combines all the local variables into one data structure (the so-called coroutine frame), and then allocates a block of memory for this structure on the heap before calling the function. Subsequently, all operations with local variables are performed in this memory block.

However, it seems that we have overstayed our welcome on the theory. Let's move on to practice.

We are writing an asynchronous web client, first without coroutines

To better understand the differences, how the implementation differs with callbacks and coroutines, and how to move from the first to the second, we will first write a web client using callbacks. The web client is very simple – it will be able to make HTTP GET requests to a given URL and return HTTP code plus a string buffer with data received from the server. For this we will use the popular CURL library.

// скомпилировать пример можно так:
// clang++ -std=c++20 -stdlib=libc++ -lcurl -lstdc++ -Wall -Wextra -Wpedantic ./curl_async.cpp
// или
// g++ ./curl_async.cpp --std=c++20 -lcurl -Wall -Wextra -Wpedantic

#include <atomic>
#include <chrono>
#include <functional>
#include <iostream>
#include <list>
#include <mutex>
#include <string>
#include <thread>

#include <curl/curl.h>

class WebClient
{
    public:
    WebClient();
    ~WebClient();

    struct Result
    {
        int code;
        std::string data;
    };

    using CallbackFn = std::function<void(Result result)>;
    void runLoop();
    void stopLoop();
    void performRequest(const std::string& url, Cb cb);

    private:
    struct Request
    {
        CallbackFn callback;
        std::string buffer;
    };

    static size_t writeToBuffer(char* ptr, size_t, size_t nmemb, void* tab)
    {
        auto r = reinterpret_cast<Request*>(tab);
        r->buffer.append(ptr, nmemb);
        return nmemb;
    }

    CURLM* m_multiHandle;
    std::atomic_bool m_break{false};
};

WebClient::WebClient()
{
    m_multiHandle = curl_multi_init();
}
WebClient::~WebClient()
{
    curl_multi_cleanup(m_multiHandle);
}

void WebClient::performRequest(const std::string& url, CallbackFn cb)
{
    Request* requestPtr = new Request{std::move(cb), {}};
    CURL* handle = curl_easy_init();
    curl_easy_setopt(handle, CURLOPT_URL, url.c_str());
    curl_easy_setopt(handle, CURLOPT_WRITEFUNCTION, &WebClient::writeToBuffer);
    curl_easy_setopt(handle, CURLOPT_WRITEDATA, requestPtr);
    curl_easy_setopt(handle, CURLOPT_PRIVATE, requestPtr);
    curl_multi_add_handle(m_multiHandle, handle);
}

void WebClient::stopLoop()
{
    m_break = true;
    curl_multi_wakeup(m_multiHandle);
}

void WebClient::runLoop()
{
  int msgs_left;
  int still_running = 1;
 
  while (!m_break) {
    curl_multi_perform(m_multiHandle, &still_running);
    curl_multi_poll(m_multiHandle, nullptr, 0, 1000, nullptr);

    CURLMsg* msg;
    while (!m_break && (msg = curl_multi_info_read(m_multiHandle, &msgs_left)))
    {
        if (msg->msg == CURLMSG_DONE)
        {
            CURL* handle = msg->easy_handle;
            int code;
            Request* requestPtr;
            curl_easy_getinfo(handle, CURLINFO_RESPONSE_CODE, &code);
            curl_easy_getinfo(handle, CURLINFO_PRIVATE, &requestPtr);
            
            requestPtr->callback({code, std::move(requestPtr->buffer)});
            curl_multi_remove_handle(m_multiHandle, handle);
            curl_easy_cleanup(handle);
            delete requestPtr;
        }
    }
  }
}

In principle, you don’t have to delve deeply into all this; to further understand how to work with coroutines, it’s enough to see and be ready to use its public interface, namely the performRequest() method, but for those who are interested and who are not familiar with CURL, I’ll briefly explain: how it works internally.

How it works

When performRequest() is called, we create a structure of type Request in which we store a string buffer (std::string) for the result and bun callback (std::function), which will need to be called when the request is completed. Here we use curl_easy_init() to create a CURL* handle for our request and set various parameters in it: URL where to go PRIVATE (user data, we put a pointer to our Request* there, so that we can later find what kind of request it was and what to do with the result), WRITEDATA (pointer to a function that will save the received data to our buffer). After this, using the curl_multi_add_handle() function, we add our newly created new request handle (such as CURL*) to the curl_multi mechanism, which will do all the asynchronous magic.

The loop() function is directly our event loop, in which this asynchronous magic happens. First, curl_multi_perform() is called, in which the curl multi mechanism does its thing: establishes connections, sends data if there is something to send, etc. Next, curl_multi_poll() is called, waiting for events on any of the observed sockets, the ones we are up to This was added by the curl_multi_add_handle function. When some events occur, we grab them one by one using the curl_multi_info_read() function and see what happened – if it is clear that the request was completed (CURLMSG_DONE), we take from CURLINFO_PRIVATE pointer to our Request, call the callback, after which we remove the handle from the multi, free the handle itself and free the Request. Their job is done.

You can use all this something like this:

int main(void)
{
  WebClient client;
  std::thread worker(std::bind(&WebClient::runLoop, &client));
  
  client.performRequest("https://postman-echo.com/get", [](WebClient::Result res)
  {
      std::cout << "Req0 Code: " << res.code << std::endl;
      std::cout << "Req0 Data: '" << res.data << "'" << std::endl << std::endl;
  });
  
  client.performRequest("http://www.gstatic.com/generate_204", [&](WebClient::Result res1)
  {
      std::cout << "Req1 Code: " << res1.code << std::endl;
      std::cout << "Req1 Data: '" << res1.data << "'" << std::endl << std::endl;
      client.performRequest("http://httpbin.org/user-agent", [](WebClient::Result res2)
      {
          std::cout << "Req1-2 Code: " << res2.code << std::endl;
          std::cout << "Req1-2 Data: '" << res2.data << "'" << std::endl << std::endl;
      });
  });
  
  client.performRequest("http://httpbin.org/ip", [](WebClient::Result res)
  {
      std::cout << "Req2 Code: " << res.code << std::endl;
      std::cout << "Req2 Data: '" << res.data << "'" << std::endl << std::endl;
  });
  
  std::cin.get();
  client.stopLoop();
  worker.join();

  return 0;
}

We launched an event loop, after which we can throw tasks into it. They can be executed either in parallel or sequentially – in the example above, Req1-2 will be executed only when we receive the result of executing Req1. The event loop doesn’t even have to be run in a separate thread; you can first throw a task, and then run the loop right there.

If the mechanics of working with CURL are not particularly clear, then don’t worry. When we start moving to coroutines, we won't change this code, just add a couple of extra whistles that reuse the method performRequest() .

If someone wants to use this code in production, keep in mind that it is deliberately simplified and requires improvement. For example, RAII is not used there, there is almost no error handling, timeouts are not set, if you stop the event loop before all requests are completed, you will get a memory leak, and you can also get a race condition because the CURL API is not thread safe (for example, there is The point is not to immediately call curl_multi_add_handle, but to add handles to a temporary list, and call curl_multi_add_handle for them already in the event loop thread in his free time). But in our case, all this does not play a big role, so the code is made as simple as possible, even if it is not entirely correct and effective.

And then we move on to the most interesting part. Coroutines.

We are writing an asynchronous web client, now with coroutines

Unlike many other languages ​​like C# or Javascript, where runtime takes on a huge amount of programmer headaches when working with coroutines, in C++ everything has to be done manually. Moreover, what we have in C++20 are generally the most basic and low-level primitives for coroutines; they are rather not for everyday use, but for library developers. But we are not looking for easy ways and are ready for difficulties, right?

From the compiler's point of view, a coroutine is a function that contains at least one of three operators: co_yield, co_await or co_return. Yes, it should be noted that a coroutine can be, for example, a lambda, but it cannot be main(), a constructor, a destructor, a constexpr function, an exception handler, and there are a number of other restrictions. Now about the operators. co_yield is for generators, it is not related to our topic, we will talk about co_return later, but co_await is what is most interesting for us. Calling co_await does literally what we need – it pauses the execution of the function, which will be resumed sometime later from the breakpoint. Taking the example above, with co_await it might look something like this:

Result r1 = co_await client.performRequestAsync("https://postman-echo.com/get");
std::cout << "Req1 ready: " << r1.code << " - " << r1.data << std::endl;
Result r2 = co_await client.performRequestAsync("http://httpbin.org/user-agent");
std::cout << "Req2 ready: " << r2.code << " - " << r2.data << std::endl;

As a result, the first line will work first, then the thread will be released, but when we receive the result of the request in the event loop, 2 and 3 lines will be executed, after which the coroutine function will fall asleep again until the result is received, and when it wakes up it will continue executing from 4 to oh lines.

The co_await operator applies to what is called Awaitable. Awaitable can be anything that has at least three of the following methods: await_ready, await_suspend, await resume:

template <typename T>
struct Awaitable<T> {
  bool await_ready() const noexcept { 
    // а нужно ли нам вообще засыпать, может все уже и так готово
    // и мы можем продолжить сразу?
  }
  void await_suspend(std::coroutine_handle<> handle) noexcept
  {
    // здесь мы можем запустить какой-то процесс,
    // по завершению которого нами будет вызван handle.resume()
  }
  T await_resume() const noexcept {
    // здесь мы вернем вызывающей стороне результат операции,
    // ну или void если не хотим ничего возвращать
  }
};

await_ready() called at the very beginning when co_await occurs. If false is returned from it, then await_suspend will be called. And if false is returned from it, it means that everything is already ready, suspend is not necessary, and you can immediately move on to await_resume in the same thread. Most often you will return false from here, but there are cases when true also makes sense, I will give one example later.

await_suspend() called when we need to “pause” the execution of the coroutine. This is where we can start some kind of asynchronous process. The coroutine_handle is passed as an argument to this function, and this is very important for us, because when our asynchronous process ends, we will need to call handle.resume() to resume execution of the coroutine.

A simple example, even without I/O: we do many, many easy small operations, but in one of the many if…else branches we must do something long and resource-intensive. In order not to block the thread for a long time, in await_suspend() we can launch a std::thread with the logic we need, and at the end of its execution, do handle.resume(), as a result of which the execution of the coroutine will continue in the same new thread.

The following can be returned from await_suspend(): 1) void – after this control will return to the one who called us, the coroutine itself will be suspended 2) true/false – if true, then the same as with void, if false, then the coroutine will not be suspended, and execution will continue 3) coroutine_handle of some other coroutine, it will be resumed.

await_resume() will be called after the previous step (when someone runs handle.resume()). From here we can simply return the result of the operation, or, if we don't want to return anything, return void.

You may ask “So wait, handle.resume() doesn't take any arguments, how do we return the result of the operation to await_resume()?” It’s simple – let’s save it directly in our Awaitable 🙂

More to the point, let's write Awaitable, which will launch a request in our web client:

struct RequestAwaitable {
  RequestAwaitable(WebClient& client_, std::strinf url_) : client(c_), url(std::move(url_)) {}; 

  bool await_ready() const noexcept { return false; }
  void await_suspend(std::coroutine_handle<> handle) noexcept
  {
    client.performRequest(std::move(url), [handle, this](Result res)
    {
      result = std::move(res);
      handle.resume();
    });
  }
  Result await_resume() const noexcept { return std::move(result); }

  WebClient& client;
  std::string url;
  Result result;
};

RequestAwaitable WebClient::performRequestAsync(std::string url)
{
  return RequestAwaitable(*this, std::move(url));
}

await_ready() always returns false (we haven’t even started the request yet, which means we’re definitely not ready), in await_suspend() we call our already familiar performRequest, upon completion of which we save the result to the structure member and run await_resume(). In await_resume() we simply return what we got. And as a bonus just below we declare a helper function performRequestAsyncwhich will construct our RequestAwaitable as it should, creating an Awaitable and putting there everything that is needed to use it (arguments, a link to the client, etc.).

In fact, if there is no goal to support an API with callbacks, then you can even get rid of performRequest() c std::function, and do what we did in it immediately in await_suspend(), and save in Request not the callback, but immediately the coroutine_handle for extension. Let's consider this alteration as homework for the curious 🙂

About exceptions

You can throw exceptions from await_resume(). For example, if an error occurred while executing the request, in await_resume you will check for this and throw. In this case, after resuming, the coroutine will continue its execution with your active exception (and somewhere there you will catch it later).

You can also store the result of the operation not as Result, but as std::future. When we run await_suspend() or directly in the constructor, we will create a std::promise, make promise.get_future(), and save it. When the operation is completed, we can make promise.set_value(), and in await_resume(), accordingly, return future.get(). Because in addition to promise.set_value(), it will also be possible to do promise.set_exception(), that is, in this way it will be possible to save entire exceptions that arise somewhere during the operation and throw them again when the coroutine is awakened.

But one more thing I know for sure is that you can throw exceptions from await_suspend(). In this case, the coroutine will immediately continue its execution with your active exception.

and now about await_ready() == true

Let's imagine that we rewrote our coroutine a little and did something like this:

auto req1 = client.performRequestAsync("https://postman-echo.com/get");
auto req2 = client.performRequestAsync("http://httpbin.org/user-agent");
co_await req1;
co_await req2;

It will work in the same way as the previous implementation – the first request will be launched first, and after its completion the second will be launched.

But… we can move our request trigger logic (client.performRequest(…)) from await_suspend() to… our Awaitable's constructor. Upon completion, we will save the result in the result field, and in order to be able to understand whether there is already a result or not, we will add the std::atomic_bool flag, or change the result a little and store it as std::optional (it may need a mutex). In case there is no result yet, in await_suspend we will store the coroutine_handle as a member of the structure, or store a std::function that will call it .resume() to continue execution when the process ends.

As a result, our asynchronous operation will start at the moment of constructing Awaitable, and by the time we do co_await, it can already be executed and we will already have the result in hand – then we can return await_ready() = true, which means that no There is no point in pausing the coroutine to wait, you can immediately continue further.

And the code given above will work in a new way – requests will begin to be executed at the time of performRequestAsync() in parallel, and then in co_await we will simply wait until they All will be completed.

So, it seems that's all we can do now co_await performRequestAsync and everything will work? Ha, no. I was lying a little, in fact, in order for a function to be a coroutine, it needs to have one of the co_yield/co_await/co_return operators, but something else is important. Return type of a function there must be something that has a promise_type declared, and the promise_type itself must have a specific set of methods. How cppreference suggests we do this:

struct promise;
 
struct coroutine : std::coroutine_handle<promise>
{
    using promise_type = ::promise;
};
 
struct promise
{
    coroutine get_return_object() { return {coroutine::from_promise(*this)}; }
    std::suspend_never initial_suspend() noexcept { return {}; }
    std::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
};

Let's figure out what kind of promise this is. It must be said right away that this promise has nothing to do with the already familiar std::promise; these are completely different types that have nothing in common. Now on to the methods:

initial_suspend() – you can return std::suspend_never or std::suspend_always. Determines whether the coroutine will be in a sleeping state from the very beginning (suspend_always), or after creation it will execute until the first breakpoint (co_await) and sleep only then. In my use cases, I needed std::suspend_never everywhere, because before going to sleep you need to plan something, and why suspend_always might be needed in the real world, I have not yet fully understood, perhaps for generators or lazy evaluations.

get_return_object() – determines what exactly will be returned to the caller at the time of the first suspension of the coroutine (immediately after its creation if initial_suspend = true, or at the time of the first co_await/co_return/co_yield if initial_suspend = false). Often they don’t bother and return themselves, but you can also return something else.

final_suspend() – you can also return std::suspend_never or std::suspend_always. Determines whether the coroutine will be paused again when its execution has finished (reached the end of the function or called co_return). In the case of std::suspend_never, we no longer need it and we no longer use it, this is the simplest and most often used case, and with std::suspend_always it is suspended again, giving us the opportunity to do some more black magic, but Be careful – in some cases, suspend_always does not automatically release resources, and there is a risk of getting a leak if you forget about it (at the end I will give a link to an article that describes such use cases).

return_void() – will fire when the end of the function is reached, or if it is called co_return; no return value. If we want to do co_return something; that is, to return something from the coroutine, then a method must be declared void return_value (T&& value)which is used to store the return value somewhere in a promise.

unhandled_exception() – defines what we will do if an unhandled exception occurs in the coroutine. Here we can do std::abort, as usually happens with unhandled exceptions, or we can log something beautifully.

However, enough theory. Let's now, taking into account the above, write a type (let's call it Task) that our coroutine should “return”. Everything is as simple as possible: the result of the coroutine will be void, we do not fall asleep after creation, and we do not fall asleep after completion:

struct promise;
struct Task : std::coroutine_handle<promise>
{
    using promise_type = ::promise;
};
struct promise
{
    Task get_return_object() { return {Task::from_promise(*this)}; }
    std::suspend_never initial_suspend() noexcept { return {}; }
    std::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
};

It's just outrageous. Done, now we can run our coroutine. main() cannot be a coroutine, so let's add another intermediate function:

Task doSomething(WebClient& client)
{
    auto r1 = co_await client.performRequestAsync("https://postman-echo.com/get");
    std::cout << "Req1 ready: " << r1.code << " - " << r1.data << std::endl;
    
    auto r2 = co_await client.performRequestAsync("http://httpbin.org/user-agent");
    std::cout << "Req2 ready: " << r2.code << " - " << r2.data << std::endl;
}

int main(void)
{
    WebClient client;
    std::thread worker(std::bind(&WebClient::runLoop, &client));

    doSomething(client);
     
    std::cin.get();
    client.stopLoop();
    worker.join();
};

Please note that our Task type turned out to be quite universal – there is no specificity in it regarding what we do. Therefore, nothing prevents you from using it in many places in the application, like a kind of library brick.

We compile, run… and everything works! Now we can create such coroutines and pull co_await() from them anywhere and however we want.

Lyrical digression about safety

As my father said in my youth, “Son, when you are having fun, do not forget to protect yourself.” Coroutine code is visually similar to regular synchronous code, but it is important for us not to forget that in fact it is still asynchronous, and therefore you can stumble upon many pitfalls from the world of asynchrony and multithreading. For example, the simplest thing is function arguments. If we got them by value, then they will go into the coroutine frame, and we can continue to work with them after it resumes. But if something was passed by reference, if it is a temporary object, then most likely after the coroutine is resumed it will no longer exist, you will receive a dangling reference. The solution is to pass arguments by value, or save from value to local variables at the very beginning (if you have initial_suspend not suspend_always, in this case it will not help). Another important point is iterators. If you do, for example, for_each for some container with co_await, then between pausing and resuming the coroutine, someone may have time to do something with this container that invalidates the iterators, and you will get a segfault with special effects. You can read more about this in the article “Disadvantages of C++ coroutines”.

In principle, Awaitables and this simple Task are already enough to do very, very much and start using coroutines in your projects.

But the attentive reader will probably wonder: okay, how can I run a coroutine from a coroutine? For example, is it possible to make co_await within one coroutine into another coroutine? How will it work, because after co_await the second coroutine will resume only until its end, but how can we resume execution of the first one from this point? And here we come close to the rabbit hole. What is described above, and what many people want to do, is called nested coroutines. The problem is that there are almost no normal articles about them with normal examples, the maximum is pieces of implementation of different crookedness and performance from numerous questions on stackoverflow. But in the end I was lucky enough to find an excellent article on this topic, where the author examined this issue in detail and provided working examples. Here she is: Yet Another C++ Coroutine Tutorial (if something happens to her, a copy remains in the web archive). The basic theory and practice of coroutines is once again discussed there, and what interests us is in the “Awaitable Coroutines” paragraph. A retelling/translation of what is described there would require a separate article, it uses two awaitables, one of which returns the coroutine_handle of an external coroutine from await_suspend, and the other runs its saved .resume() in its await_suspend(), this is returned in the promise in final_suspend() the second is awaitable, and also due to the fact that final_suspend is used, some things have to be cleaned up manually. It sounds complicated, but in the end everything works as it should, you can call one coroutine from another, and after the first one is resumed and completed, the second one will resume. Seriously, read it, it’s very interesting, but for those who are impatient, I’ll give the code from there:

Nested coroutines with the ability to return values, working code (author – theshoemaker.de)
template <typename Result = void>
class [[nodiscard]] Task {
public:
    struct FinalAwaiter {
        bool await_ready() const noexcept { return false; }

        template <typename P>
        auto await_suspend(std::coroutine_handle<P> handle) noexcept
        {
            return handle.promise().continuation;
        }

        void await_resume() const noexcept { }
    };

    struct Promise {
        std::coroutine_handle<> continuation;
        Result result;

        Task get_return_object()
        {
            return Task { std::coroutine_handle<Promise>::from_promise(*this) };
        }

        void unhandled_exception() noexcept { }

        void return_value(Result&& res) noexcept { result = std::move(res); }

        std::suspend_always initial_suspend() noexcept { return {}; }
        FinalAwaiter final_suspend() noexcept { return {}; }
    };
    using promise_type = Promise;

    Task() = default;

    ~Task()
    {
        if (handle_) {
            handle_.destroy();
        }
    }

    struct Awaiter {
        std::coroutine_handle<Promise> handle;

        bool await_ready() const noexcept { return !handle || handle.done(); }

        auto await_suspend(std::coroutine_handle<> calling) noexcept
        {
            handle.promise().continuation = calling;
            return handle;
        }

        template <typename T = Result>
        requires(std::is_same_v<T, void>)
        void await_resume() noexcept { }

        template <typename T = Result>
        requires(!std::is_same_v<T, void>)
        T await_resume() noexcept { return std::move(handle.promise().result); }
    };

    auto operator co_await() noexcept { return Awaiter { handle_ }; }

private:
    explicit Task(std::coroutine_handle<Promise> handle)
        : handle_(handle)
    {
    }

    std::coroutine_handle<Promise> handle_;
};

template <>
struct Task<void>::Promise {
    std::coroutine_handle<> continuation;

    Task get_return_object()
    {
        return Task { std::coroutine_handle<Promise>::from_promise(*this) };
    }

    void unhandled_exception() noexcept { }

    void return_void() noexcept { }

    std::suspend_always initial_suspend() noexcept { return {}; }
    FinalAwaiter final_suspend() noexcept { return {}; }
};

Moreover, thanks to the template nature and the separate specialization of the template for void, such a Task can return values ​​from coroutines (a chain of nested coroutines), or it can return nothing, as you wish.

I suspect that many readers will have a quite predictable reaction after all this: “WTF? Seriously? Why is it SO difficult? Why so many dances with a tambourine to run the simplest coroutines?!” The frustration is quite understandable. As I said above, what has already been included in the C++20 standard is only the lowest-level primitives for coroutines. They are intended not so much for everyday use, but rather for library developers. Accordingly, in order to use them conveniently, you should also feel like a library developer – create different Awaitables in those places where the transition from the world of coroutines to the world of threads or low-level calls occurs, declare classes like the simple Task described above or a complex Task with nesting support (like I’ve already said that these classes are quite universal, you don’t have to create dozens of them for different uses, a couple will be enough for all occasions), and after that it’s a pleasure to write code with coroutines in the rest of your application without much headache. Or you can use ready-made libraries, for example, cppcoro – today it is the most powerful and popular library for writing coroutine code in C++. In it you will find not only bricks and primitives for coroutines, such as those described above Task, but also various interesting wrappers, for example, async_mutex, as well as numerous Awaitables for working with files, sockets, etc.

List of interesting links for those who want to know more

  1. Coroutines (cppreference.com)

  2. Yet Another C++ Coroutines tutorial

  3. Disadvantages of C++ coroutines

  4. Pavel Novikov – learning to prepare C++ coroutines in practice (video)

  5. C++20 Practical coroutines

  6. C++20 Coroutines: sketching a minimal async framework

  7. My tutorial and take on C++20 coroutines

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *