How Async actually works

Since the original article is quite voluminous, I took the liberty of breaking it into several independent parts that are easier to translate and understand.

Disclaimer: I am not a professional translator, the translation was prepared more for myself and colleagues. I will be grateful for any corrections and help in translation, the article is very interesting, let’s make it available in Russian.

  1. Part 1: At the very beginning…

  2. Part 2: Event Based Asynchronous Model (EAP)

  3. Part 3: The advent of Tasks (Task-based Asynchronous Model (TAP)

  4. Part 4: …and ValueTasks

  5. Part 5: C# Iterators to the rescue

    1. Async/await: Internals

      1. Compiler conversions

      2. SynchronizationContext and ConfigureAwait

      3. Fields in the State Machine

  6. Conclusion

C# iterators to the rescue

A glimmer of hope for such a solution came a few years before Task was introduced in C# 2.0, when support for iterators was added.

Iterators? – you ask? “You mean for IEnumerable?”. Exactly. Iterators allow you to write a single method, which is then used by the compiler to implement IEnumerable and/or IEnumerator. For example, if I wanted to create an enumerator that outputs the Fibonacci sequence, I could write something like this:

public static IEnumerable<int> Fib()
{
    int prev = 0, next = 1;
    yield return prev;
    yield return next;

    while (true)
    {
        int sum = prev + next;
        yield return sum;
        prev = next;
        next = sum;
    }
}

I can then enumerate them with the foreach function:

foreach (int i in Fib())
{
    if (i > 100) break;
    Console.Write($"{i} ");
}

I can compose it with other IEnumerables using combinators like those used in System.Linq.Enumerable:

foreach (int i in Fib().Take(12))
{
    Console.Write($"{i} ");
}

Or I can just manually enumerate them directly via IEnumerator:

using IEnumerator<int> e = Fib().GetEnumerator();
while (e.MoveNext())
{
    int i = e.Current;
    if (i > 100) break;
    Console.Write($"{i} ");
}

All of the above actions lead to the following result:

0 1 1 2 3 5 8 13 21 34 55 89

The interesting thing about this is that in order to achieve the above, we need to be able to enter and exit the Fib method multiple times. We call MoveNext, it enters the method, then the method executes until it encounters a yield return, at which point the call to MoveNext should return true, and the subsequent call to Current should return yield. We then call MoveNext again and we need to be able to return to Fib right after we left off and with all the state of the previous call. Iterators are actually coroutines provided by the C# language/compiler, with the compiler extending my Fib iterator into a full state machine:

public static IEnumerable<int> Fib() => new <Fib>d__0(-2);

[CompilerGenerated]
private sealed class <Fib>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
    private int <>1__state;
    private int <>2__current;
    private int <>l__initialThreadId;
    private int <prev>5__2;
    private int <next>5__3;
    private int <sum>5__4;

    int IEnumerator<int>.Current => <>2__current;
    object IEnumerator.Current => <>2__current;

    public <Fib>d__0(int <>1__state)
    {
        this.<>1__state = <>1__state;
        <>l__initialThreadId = Environment.CurrentManagedThreadId;
    }

    private bool MoveNext()
    {
        switch (<>1__state)
        {
            default:
                return false;
            case 0:
                <>1__state = -1;
                <prev>5__2 = 0;
                <next>5__3 = 1;
                <>2__current = <prev>5__2;
                <>1__state = 1;
                return true;
            case 1:
                <>1__state = -1;
                <>2__current = <next>5__3;
                <>1__state = 2;
                return true;
            case 2:
                <>1__state = -1;
                break;
            case 3:
                <>1__state = -1;
                <prev>5__2 = <next>5__3;
                <next>5__3 = <sum>5__4;
                break;
        }
        <sum>5__4 = <prev>5__2 + <next>5__3;
        <>2__current = <sum>5__4;
        <>1__state = 3;
        return true;
    }

    IEnumerator<int> IEnumerable<int>.GetEnumerator()
    {
        if (<>1__state == -2 &&
            <>l__initialThreadId == Environment.CurrentManagedThreadId)
        {
            <>1__state = 0;
            return this;
        }
        return new <Fib>d__0(0);
    }

    IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<int>)this).GetEnumerator();
    void IEnumerator.Reset() => throw new NotSupportedException();
    void IDisposable.Dispose() { }
}

All logic for Fib is now inside the MoveNext method, but as part of a jump table that allows the implementation to jump to where it last left off, which is tracked in the generated enumerator type state field. And variables that I wrote as local, such as prev, next, and sum, were “rafted” into enumerator fields so that they could persist across calls to MoveNext.

(Note that the previous code snippet showing how the C# compiler produces an implementation will not compile as is. The C# compiler synthesizes “non-expressive” names, that is, it names the types and members it creates in ways that are legal in IL, but not in C# so as not to conflict with any user-defined types and members (I named everything the way the compiler does, but if you want to experiment with compilation, you can rename everything to use the names allowed in C# instead).

In my previous example, the last enumeration form I showed involved manually using the IEnumerator. At this level, we manually called MoveNext(), deciding when the right time would be to re-enter the coroutine. But… what if, instead of calling it that way, I could have the next call to MoveNext be part of the continuation work that happens when the asynchronous operation completes? What if I could yield return something that represents an asynchronous operation, and the consuming code would attach a continuation to that object, and that continuation would then execute MoveNext? With this approach, I could write a helper method like this:

static Task IterateAsync(IEnumerable<Task> tasks)
{
    var tcs = new TaskCompletionSource();

    IEnumerator<Task> e = tasks.GetEnumerator();

    void Process()
    {
        try
        {
            if (e.MoveNext())
            {
                e.Current.ContinueWith(t => Process());
                return;
            }
        }
        catch (Exception e)
        {
            tcs.SetException(e);
            return;
        }
        tcs.SetResult();
    };
    Process();

    return tcs.Task;
}

Now this is getting interesting. We are given an enumerable set of tasks that we can enumerate. Every time we move to the next Task and get it, we connect the continuation to that Task; when that Task completes, it just unrolls and goes back to the same logic that does MoveNext, gets the next Task, and so on. This is based on the idea of ​​a Task as a single representation for any asynchronous operation, so the enum we get can be a sequence of any asynchronous operations. Where can such a sequence come from? Of course, from an iterator. Remember our previous CopyStreamToStream example and how terrible the APM-based implementation was? Consider this instead:

static Task CopyStreamToStreamAsync(Stream source, Stream destination)
{
    return IterateAsync(Impl(source, destination));

    static IEnumerable<Task> Impl(Stream source, Stream destination)
    {
        var buffer = new byte[0x1000];
        while (true)
        {
            Task<int> read = source.ReadAsync(buffer, 0, buffer.Length);
            yield return read;
            int numRead = read.Result;
            if (numRead <= 0)
            {
                break;
            }

            Task write = destination.WriteAsync(buffer, 0, numRead);
            yield return write;
            write.Wait();
        }
    }
}

Wow, that’s almost legible. We call the IterateAsync helper, and the enum we pass to it is created by an iterator that handles the entire control flow for the copy. It calls Stream.ReadAsync and then yield return this Task; this given task is what will be passed to IterateAsync after the call to MoveNext, and IterateAsync will attach a continuation to this Task which, when completed, will simply call MoveNext and end up in that iterator again right after yield. At this point, the Impl logic receives the result of the method, calls WriteAsync, and issues the created Task again. And so on.

And that, my friends, is the beginning of async/await in C# and .NET. Approximately 95% of the iterator and async/await support logic in the C# compiler is shared. Different syntax, different types, but basically it’s the same conversion. Take a closer look at yield returns and you can almost see awaits instead.

In fact, some savvy developers used iterators in a similar way for asynchronous programming before the advent of async/await. A similar transformation has been prototyped in an experimental programming language Axum, which served as a key inspiration for asynchrony support in C#. Axum provided the async keyword that you could put in a method, just like async is now in C#. Task was not yet ubiquitous, so inside async methods, the Axum compiler heuristically matched synchronous method calls with their APM counterparts, for example, if it saw you calling stream.Read, it would find and use the corresponding stream.BeginRead and stream.EndRead methods, synthesizing the appropriate delegate to pass to the Begin method, and also generating a complete APM implementation for the async method being defined so that it is composable. It even integrated with SynchronizationContext! Although Axum was eventually shelved, it served as an amazing and motivating prototype for what would eventually become async/await in C#.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *