Reduction of allocation during closure (closure)

Recently, I had the task of reducing the allocation in a very hot place in the code. A trivial thing happens there: Tasks are launched in which objects are processed by a known set of handlers. Armed with a profiler, I was surprised to find that a lot of memory (and a lot of GC time) is spent deleting closure objects.

What is a closure in C#?

Closures (closure) is a very cool thing that helps to write more concise code in C #. Under the hood, a closure is more or less a normal class that “captures” references to the variables that are involved in the closure.

I think you have seen many IDEs honestly suggest that variable capture occurs at the place where the closure is used:

closure notice in IDE Rider
closure notice in IDE Rider

What happens “under the hood”? In the same class, a class is created that represents the same closure. The class is specifically named in a tricky way (in my case it is called DisplayClass4_0) and is marked with the attribute CompilerGenerated.

In a special class, a set of fields is created according to the number of variables captured by the closure. Also, a method is created, the reference to which is passed to the Task.Run method.

In the decompiled code (I use dotPeek) it looks something like this:

Decompiled code with closures in C#
Decompiled code with closures in C#

Why does it happen this way? Because “closure” is not a mechanism of the .NET platform, but of the C# language. If you like, it is syntactic sugar that makes the language beautiful and expressive. However, at a lower level, any syntactic sugar requires a low-level implementation – and that’s what it is. More about closures written by Sergey Teplyakov. He has written a lot, it is explained relatively easily, including the features of working with closures.

Allocation on Closure

You can notice that every time the closure is called “under the hood”, an instance of the closure class is created, the captured values ​​are placed in its fields, and a reference to the closure-class method is passed to the method we need, where the logic specified in the closure is executed. Let me remind you that the class instance is located on the heap, from where it will then be removed by the GC.

It seems that this is not at all scary, since in the vast majority of cases we are talking about placing an instance in Gen0, from where it will be quickly deleted. Moreover, the closure class itself is extremely lightweight and does not take up much space.

However, if the place where the closure is used is hot (called frequently), then the GC may not have time to delete instances of the closure class. In the most unfortunate scenarios, this can lead to classes “surviving” up to Gen2, followed by a stop the world to perform a thoughtful heap cleanup.

Moreover, we should not forget that not all platform implementations work the same way. For example, the Unity game engine has a special single-generation GC. This requires developers to be very careful about who allocates what and in what quantities.

Implementing your own closure

To reduce the load on the GC, in some scenarios, you can try to write your own implementation of the closure. It seems to be easy since we know how closure works.

private sealed class Closure<T> {
  private readonly Action<T> _action;
  private readonly Action _closure;
  private T _value;

  public Closure(Action<T> action) {
    _action = action;
    _closure = Execute;
  }

  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  public void Clear() => _value = default;

  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  public Action Prepare(T value) {
    _value = value;
    return _closure;
  }

  private void Execute() => _action(_value);
}

If we know the number of calls to this class, then we can easily preallocate all closures. When using, we simply write the value of the argument to the class field (just like we would have done for us “under the hood”), and as a method we use a previously created link to the method of the same class.

for (var i = 0; i < _objects.Length; i++) {
    _tasks[i] = Task.Run(_closures[i].Prepare(in _objects[i]))
}

Task.WaitAll(_tasks);

foreach (var closure in _closures) {
    closure.Clear();
}

When using our own closure class when decompiling the code, we can observe a more understandable picture without “magic under the hood”. As a bonus, we got rid of creating a new Action when passing a reference to the closure method to the method we need.

Decompiled code with native closure in C#
Decompiled code with native closure in C#

The downside of this use is that the closure class instance needs to be cleared of the value that was previously passed to it. This is necessary because this is a potential memory leak, as the closure will keep a reference to the “captured” value forever.

Another disadvantage is multithreading. When using your own closure implementation, you need to ensure that the values ​​passed to the closure are atomic for each of the threads. How to do it beautifully and without much difficulty is a completely different question.

Allocation reduction when closing

What specific numbers can we get when replacing the standard locking mechanism with our own bike? Measurements were made using the well-known framework for microbenchmarking BenchmarkDotNet. The benchmark code is here.

method

runtime

mean

error

StdDev

Ratio

Gen 0

Gen 1

Gen 2

Allocated

AutoClosure

.NET 6.0

6.419ms

0.0541ms

0.0507ms

1.00

203.1250

1,708KB

selfclosure

.NET 6.0

5.922ms

0.0521ms

0.0487ms

0.92

93.7500

773KB

AutoClosure

.NET Core 3.1

7.122ms

0.0155ms

0.0145ms

1.00

203.1250

1,708KB

selfclosure

.NET Core 3.1

6.695ms

0.0117ms

0.0109ms

0.94

93.7500

774KB

AutoClosure

.NET Framework 4.6.1

8.151ms

0.0191ms

0.0179ms

1.00

296.8750

62.5000

15.6250

1,895KB

selfclosure

.NET Framework 4.6.1

8.348ms

0.0189ms

0.0177ms

1.02

140.6250

15.6250

954KB

It’s nice that the speed has remained about the same. This suggests that it was done more or less correctly.

The column “Allocated” cheerfully reports to us that the allocation is almost two times less. But, in fact, why is it there? If you look at the benchmark code, you’ll notice that I’m trying to minimize allocation when running Tasks. This is a fairly common use case for closures. The numbers that can be seen in the Allocated column include the costs of the platform for creating Tasks.

PS: Started writing in cart and, suddenly, Zen. Check it out if you’re interested.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *