Reduction of allocation during closure (closure)

Recently, I had the task of reducing the allocation in a very hot place in the code. A trivial thing happens there: Tasks are launched in which objects are processed by a known set of handlers. Armed with a profiler, I was surprised to find that a lot of memory (and a lot of GC time) is spent deleting closure objects.
What is a closure in C#?
Closures (closure) is a very cool thing that helps to write more concise code in C #. Under the hood, a closure is more or less a normal class that “captures” references to the variables that are involved in the closure.
I think you have seen many IDEs honestly suggest that variable capture occurs at the place where the closure is used:

What happens “under the hood”? In the same class, a class is created that represents the same closure. The class is specifically named in a tricky way (in my case it is called DisplayClass4_0
) and is marked with the attribute CompilerGenerated.
In a special class, a set of fields is created according to the number of variables captured by the closure. Also, a method is created, the reference to which is passed to the Task.Run method.
In the decompiled code (I use dotPeek) it looks something like this:

Why does it happen this way? Because “closure” is not a mechanism of the .NET platform, but of the C# language. If you like, it is syntactic sugar that makes the language beautiful and expressive. However, at a lower level, any syntactic sugar requires a low-level implementation – and that’s what it is. More about closures written by Sergey Teplyakov. He has written a lot, it is explained relatively easily, including the features of working with closures.
Allocation on Closure
You can notice that every time the closure is called “under the hood”, an instance of the closure class is created, the captured values are placed in its fields, and a reference to the closure-class method is passed to the method we need, where the logic specified in the closure is executed. Let me remind you that the class instance is located on the heap, from where it will then be removed by the GC.
It seems that this is not at all scary, since in the vast majority of cases we are talking about placing an instance in Gen0, from where it will be quickly deleted. Moreover, the closure class itself is extremely lightweight and does not take up much space.
However, if the place where the closure is used is hot (called frequently), then the GC may not have time to delete instances of the closure class. In the most unfortunate scenarios, this can lead to classes “surviving” up to Gen2, followed by a stop the world to perform a thoughtful heap cleanup.
Moreover, we should not forget that not all platform implementations work the same way. For example, the Unity game engine has a special single-generation GC. This requires developers to be very careful about who allocates what and in what quantities.
Implementing your own closure
To reduce the load on the GC, in some scenarios, you can try to write your own implementation of the closure. It seems to be easy since we know how closure works.
private sealed class Closure<T> {
private readonly Action<T> _action;
private readonly Action _closure;
private T _value;
public Closure(Action<T> action) {
_action = action;
_closure = Execute;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Clear() => _value = default;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public Action Prepare(T value) {
_value = value;
return _closure;
}
private void Execute() => _action(_value);
}
If we know the number of calls to this class, then we can easily preallocate all closures. When using, we simply write the value of the argument to the class field (just like we would have done for us “under the hood”), and as a method we use a previously created link to the method of the same class.
for (var i = 0; i < _objects.Length; i++) {
_tasks[i] = Task.Run(_closures[i].Prepare(in _objects[i]))
}
Task.WaitAll(_tasks);
foreach (var closure in _closures) {
closure.Clear();
}
When using our own closure class when decompiling the code, we can observe a more understandable picture without “magic under the hood”. As a bonus, we got rid of creating a new Action when passing a reference to the closure method to the method we need.

The downside of this use is that the closure class instance needs to be cleared of the value that was previously passed to it. This is necessary because this is a potential memory leak, as the closure will keep a reference to the “captured” value forever.
Another disadvantage is multithreading. When using your own closure implementation, you need to ensure that the values passed to the closure are atomic for each of the threads. How to do it beautifully and without much difficulty is a completely different question.
Allocation reduction when closing
What specific numbers can we get when replacing the standard locking mechanism with our own bike? Measurements were made using the well-known framework for microbenchmarking BenchmarkDotNet. The benchmark code is here.
method | runtime | mean | error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
AutoClosure | .NET 6.0 | 6.419ms | 0.0541ms | 0.0507ms | 1.00 | 203.1250 | – | – | 1,708KB |
selfclosure | .NET 6.0 | 5.922ms | 0.0521ms | 0.0487ms | 0.92 | 93.7500 | – | – | 773KB |
AutoClosure | .NET Core 3.1 | 7.122ms | 0.0155ms | 0.0145ms | 1.00 | 203.1250 | – | – | 1,708KB |
selfclosure | .NET Core 3.1 | 6.695ms | 0.0117ms | 0.0109ms | 0.94 | 93.7500 | – | – | 774KB |
AutoClosure | .NET Framework 4.6.1 | 8.151ms | 0.0191ms | 0.0179ms | 1.00 | 296.8750 | 62.5000 | 15.6250 | 1,895KB |
selfclosure | .NET Framework 4.6.1 | 8.348ms | 0.0189ms | 0.0177ms | 1.02 | 140.6250 | 15.6250 | – | 954KB |
It’s nice that the speed has remained about the same. This suggests that it was done more or less correctly.
The column “Allocated” cheerfully reports to us that the allocation is almost two times less. But, in fact, why is it there? If you look at the benchmark code, you’ll notice that I’m trying to minimize allocation when running Tasks. This is a fairly common use case for closures. The numbers that can be seen in the Allocated column include the costs of the platform for creating Tasks.
PS: Started writing in cart and, suddenly, Zen. Check it out if you’re interested.