Optimizing a poorly written mapper

What am I talking about?

This project is a real case from my work and is dedicated to the consequences of carelessly writing a simple mapper as part of fixing one endpoint (endpoint / “handle”) in a Web Rest API service as part of a refactoring project and moving from on-premise servers to big clouds.

Since the code for this project cannot be distributed, I created a very similar mapper for weather forecasts (inspired by the standard template).
Full code available at GitHub.

How it all began

One day, a tester came to the programmers with questions about an endpoint that works “forever”. This statement was strange, because it had not been noticed before, but there was one BUT. A long-awaited event happened a couple of days before – we were provided with a full backup of the test database, which, of course, we quickly uploaded to our test envs instead of the generated data. In the next few days, about a dozen more “forever” working endpoints were discovered, but this article will fix only one of them, which worked for about 30 seconds on a low-power test environment and about 16 seconds on my PC. After reading the call chain and debugging for a short time, I found that the mapper takes up all this time, except for ~100 ms, that is, 16 seconds are spent on creating a new collection of fairly simple objects, but complex source collections – in most cases containing from 5,000 to 25,000 records.

What is this mapper for?

The purpose of the mapper is to combine data from two arrays received as input: Temperature[] Temperatures and strings[] Places array into a third array and return it. The resulting array must be the same size as the Temperatures array and contain the values ​​and dates from the Temperatures array. The data for the States and Seasons fields must come from the Places array. Each date in the Temperatures array is unique. The Places array may be less than, equal to, or greater than the Temperatures array. The Places array contains strings that may consist of one, two, or three segments, separated by ';'. The first segment is a date in the format: month/day/year and a constant timestamp of 00:00:00, the date is unique for each array; this segment is always present. The second segment is the two letters of the state, it is optional and may be omitted. The third segment is the state abbreviation, also optional and may be omitted along with the separator. State and season abbreviations must be converted to full state and season names, the mapper must be case-insensitive. Arrays must be matched by date. If there is no entry in the Places array with the corresponding date, or the entry does not have a second and/or third segment, the states and/or seasons fields must be null.

Results

BenchmarkDotNet v0.13.11, Windows 11, AMD Ryzen 7 6800H.
.NET SDK 8.0.204
[Host] : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
DefaultJob: .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2

Method

N

Mean

Error

StdDev

Median

Ratio

RatioSD

Gen0

Gen1

Gen2

Allocated

Alloc Ratio

MapOriginal

10000

13,117.831ms

120.0636ms

112.3075ms

13,117,128ms

1,241.72

58.01

1568000.0000

25000.0000

12511.64MB

2,437.84

MapOptimized

10000

12.582ms

0.5073ms

1.4957ms

13.328ms

1.00

0.00

875.0000

796.8750

484.3750

5.13MB

1.00

MapOptimizedStruct

10000

5.494ms

0.1072ms

0.1002ms

5.515ms

0.52

0.02

570.3125

515.6250

390.6250

4.16MB

0.81

MapOptimizedStructMarshal

10000

5.175ms

0.0451ms

0.0400ms

5.183ms

0.49

0.02

632.8125

585.9375

468.7500

3.39MB

0.66

Why is the mapper so slow?

The entry point is a small Map method.
The main logic is in the MapSingle method, where we can find the main resource waster.

public static class Original
{
    private static readonly DateTimeFormatInfo DateTimeFormatInfo = new()
    {
        ShortDatePattern = "MM/dd/yyyy HH:mm:ss",
        LongDatePattern = "MM/dd/yyyy HH:mm:ss",
    };
    
    // The "entry point"
    public static Output[] Map(string json)
    {
        var data = JsonSerializer.Deserialize<Input>(json)!;
        return data.Temperatures.Select(t => t.MapSingle(data.Places)).ToArray();
    }

    // The main mapper
    private static Output MapSingle(this Temperature src, string[] places)
    {
        var data = GetData(src, places);
        var state = data?.Length > 1 ? data[1] : null;
        var season = data?.Length > 2 ? data[2].ToLower() : null;

        var mapSingle = new Output
        {
            Date = src.Date,
            Value = double.Parse(src.Value.ToString("0.##0")),
            State = GetState(state),
            Season = GetSeason(season)
        };
        return mapSingle;

        string[]? GetData(Temperature temperature, string[] strings)
        {
            // After some time reading the code, you can notice that this is a foreach cycle inside .Select cycle
            // for searching an element and this is the general computation recourse waster.
            foreach (var str in strings)
            {
                // The method creates a lot of allocations with new strings, but only the fist one is in use.
                var segments = str.Split(';');
                
                // To pass tests, paste the DateTimeFormatInfo as second argument here in DateTime.TryParse().  
                if (DateTime.TryParse(segments[0], out var result))
                {
                    if (DateTime.Equals(temperature.Date, result))
                    {
                        return segments;
                    }
                }
            }

            return null;
        }

        string? GetState(string? s1)
        {
            return s1 switch
            {
                "WA" => "Washington",
                "OR" => "Oregon",
                "NE" => "New York",
                "AL" => "Alaska",
                "CO" => "Colorado",
                _ => null
            };
        }

        string? GetSeason(string? s2)
        {
            return s2 switch
            {
                "wi" => "Winter",
                "sp" => "Spring",
                "su" => "Summer",
                "fall" => "Autumn",
                _ => null
            };
        }
    }
}

How to improve

We can significantly improve performance by simply replacing the linear search in the array/list with a Dictionary search, or more precisely, a FrozenDictionary. This is a search-optimized immutable dictionary introduced in .NET 8.
But is it worth limiting yourself to just these changes, since the code is still far from being in a decent state?
For example, using the method .Split() creates new strings that will be checked by the garbage collector. This method can be replaced with .AsSpan() without creating objects on the heap.
These changes will not result in a dramatic increase in the performance of a particular method, but will improve the overall performance of the service by reducing the load on the GC.
After these improvements, the execution time of the method will be reduced dramatically.

public static class Optimized
{
    private static readonly DateTimeFormatInfo DateTimeFormatInfo = new()
    {
        ShortDatePattern = "MM/dd/yyyy",
        LongDatePattern = "MM/dd/yyyy"
    };

    public static Output[] Map(string json)
    {
        var data = JsonSerializer.Deserialize<Input>(json)!;
        
        // The biggest performance changes are here - List has been replaced with FrozenDictionary. 
        var places = data.Places.ToFrozenDictionary(GetDate, GetSeasonState);

        return data.Temperatures.Select(t =>
        {
            places.TryGetValue(DateOnly.FromDateTime(t.Date), out var result);
            return new Output
            {
                Date = t.Date,
                Value = double.Round(t.Value, 3),
                Season = result.Season,
                State = result.State
            };
        }).ToArray();
    }

    private static DateOnly GetDate(string s)
    {
        return DateOnly.Parse(s.AsSpan(0, 10), DateTimeFormatInfo);
    }

    private static (string? Season, string? State) GetSeasonState(string s)
    {
        return (GetSeason(s), GetState(s));

        string? GetState(string str)
        {
            if (str.Length < 21)
                return null;

            // The contract is solid - we can use span with hardcoded values to use a part of the string.
            // It eliminates all heap allocations because span is a ref struct.
            // It will not dramatically improve the performance of the specific method by improving the overall performance of the service.
            var state = str.AsSpan(20, 2);
            return state switch
            {
                { } when state.Equals("WA", StringComparison.OrdinalIgnoreCase) => "Washington",
                { } when state.Equals("OR", StringComparison.OrdinalIgnoreCase) => "Oregon",
                { } when state.Equals("CO", StringComparison.OrdinalIgnoreCase) => "Colorado",
                { } when state.Equals("AL", StringComparison.OrdinalIgnoreCase) => "Alaska",
                { } when state.Equals("NE", StringComparison.OrdinalIgnoreCase) => "New York",
                _ => null
            };
        }

        string? GetSeason(string str)
        {
            if (str.Length < 24)
                return null;

            var season = str.AsSpan()[23..];
            return season switch
            {
                { } when season.Equals("wi", StringComparison.OrdinalIgnoreCase) => "Winter",
                { } when season.Equals("sp", StringComparison.OrdinalIgnoreCase) => "Spring",
                { } when season.Equals("su", StringComparison.OrdinalIgnoreCase) => "Summer",
                { } when season.Equals("fall", StringComparison.OrdinalIgnoreCase) => "Autumn",
                _ => null
            };
        }
    }
}

We add acceleration, but increase the difficulty.

We can make the mapper even faster without diving into unsafe code and without converting the mapper readonly for most programmers.
We can replace classes with structures for anemic models, or to be precise, use readonly record structure instead of records.

// Before
public record Output
{
    public DateTime Date { get; init; }
    public double Value { get; init; }
    public string? State { get; init; }
    public string? Season { get; init; }
}

public record Input(Temperature[] Temperatures, string[] Places);

public record Temperature(DateTime Date, double Value);

// After
public readonly record struct OutputSt
{
    public DateTime Date { get; init; }
    public double Value { get; init; }
    public string? State { get; init; }
    public string? Season { get; init; }
}

public readonly record struct InputSt(TemperatureSt[] Temperatures, string[] Places);

public readonly record struct TemperatureSt(DateTime Date, double Value);

Usage struct is not as simple as using classes, because of their different nature. We must remember that when passing a structure as an argument or returning it from a method, the structure is copied.
For small structures like primitives this is more than okay, but for larger entities it will result in performance degradation.
But we can fix this problem by simply using ref/it/out to avoid copying data.
Another feature of structures is the structure of arrays (and array-based collections).
An array of classes stores only references to an instance of that class on the heap, while an array of structures stores the entire structure.
I think that an array of structures gives less memory fragmentation (but has a higher chance of being allocated in LOH), which will reduce the time to get an array element.
These small changes improve the execution time of the method by a factor of two.

public static class OptimizedStruct
{
    private static readonly DateTimeFormatInfo DateTimeFormatInfo = new()
    {
        ShortDatePattern = "MM/dd/yyyy",
        LongDatePattern = "MM/dd/yyyy"
    };
    
    public static OutputSt[] Map(string json)
    {
        var data = JsonSerializer.Deserialize<InputSt>(json);
        var places = data.Places.ToFrozenDictionary(GetDate, GetSeasonState);

        return data.Temperatures.Select(t =>
        {
            places.TryGetValue(DateOnly.FromDateTime(t.Date), out var result);
            return new OutputSt
            {
                Date = t.Date,
                Value = double.Round(t.Value, 3),
                Season = result.Season,
                State = result.State
            };
        }).ToArray();
    }

    private static DateOnly GetDate(string s)
    {
        return DateOnly.Parse(s.AsSpan(0, 10), DateTimeFormatInfo);
    }

    private static (string? Season, string? State) GetSeasonState(string s)
    {
        return (GetSeason(s), GetState(s));
        
        string? GetState(string str)
        {
            if (str.Length < 21)
                return null;

            var state = str.AsSpan(20, 2);
            return state switch
            {
                { } when state.Equals("WA", StringComparison.OrdinalIgnoreCase) => "Washington",
                { } when state.Equals("OR", StringComparison.OrdinalIgnoreCase) => "Oregon",
                { } when state.Equals("CO", StringComparison.OrdinalIgnoreCase) => "Colorado",
                { } when state.Equals("AL", StringComparison.OrdinalIgnoreCase) => "Alaska",
                { } when state.Equals("NE", StringComparison.OrdinalIgnoreCase) => "New York",
                _ => null
            };
        }

        string? GetSeason(string str)
        {
            if (str.Length < 24)
                return null;

            var season = str.AsSpan(23, str.Length - 23);
            return season switch
            {
                { } when season.Equals("wi", StringComparison.OrdinalIgnoreCase) => "Winter",
                { } when season.Equals("sp", StringComparison.OrdinalIgnoreCase) => "Spring",
                { } when season.Equals("su", StringComparison.OrdinalIgnoreCase) => "Summer",
                { } when season.Equals("fall", StringComparison.OrdinalIgnoreCase) => "Autumn",
                _ => null
            };
        }
    }
}

A little more acceleration.

We can improve the performance a little bit by using a special method GetValueRefOrNullRef from static class CollectionsMarshal to find an element and get a link to it.
In this case, FrozenDictionary will have to be replaced with a regular Dictionary.
This manipulation gives us about 6% increase compared to the previous method.

public static class OptimizedStructMarshal
{
    private static readonly DateTimeFormatInfo DateTimeFormatInfo = new()
    {
        ShortDatePattern = "MM/dd/yyyy",
        LongDatePattern = "MM/dd/yyyy"
    };
    
    public static OutputSt[] Map(string json)
    {
        var data = JsonSerializer.Deserialize<InputSt>(json);
        var places = data.Places.ToDictionary(GetDate, GetSeasonState);

        return data.Temperatures.Select(t =>
        {
            // New way to find the element.
            ref var result = ref CollectionsMarshal.GetValueRefOrNullRef(places, DateOnly.FromDateTime(t.Date));
            return new OutputSt
            {
                Date = t.Date,
                Value = double.Round(t.Value, 3),
                Season = result.Season,
                State = result.State
            };
        }).ToArray();
    }
    
    private static DateOnly GetDate(string s)
    {
        return DateOnly.Parse(s.AsSpan(0, 10), DateTimeFormatInfo);
    }

    private static (string? Season, string? State) GetSeasonState(string s)
    {
        return (GetSeason(s), GetState(s));
        
        string? GetState(string str)
        {
            if (str.Length < 21)
                return null;

            var state = str.AsSpan(20, 2);
            return state switch
            {
                { } when state.Equals("WA", StringComparison.OrdinalIgnoreCase) => "Washington",
                { } when state.Equals("OR", StringComparison.OrdinalIgnoreCase) => "Oregon",
                { } when state.Equals("CO", StringComparison.OrdinalIgnoreCase) => "Colorado",
                { } when state.Equals("AL", StringComparison.OrdinalIgnoreCase) => "Alaska",
                { } when state.Equals("NE", StringComparison.OrdinalIgnoreCase) => "New York",
                _ => null
            };
        }

        string? GetSeason(string str)
        {
            if (str.Length < 24)
                return null;

            var season = str.AsSpan(23, str.Length - 23);
            return season switch
            {
                { } when season.Equals("wi", StringComparison.OrdinalIgnoreCase) => "Winter",
                { } when season.Equals("sp", StringComparison.OrdinalIgnoreCase) => "Spring",
                { } when season.Equals("su", StringComparison.OrdinalIgnoreCase) => "Summer",
                { } when season.Equals("fall", StringComparison.OrdinalIgnoreCase) => "Autumn",
                _ => null
            };
        }
    }
}

PS: The article was written to popularize my profile on LinkedIn. Code available at GitHub.
PSS: Thanks to DeepL for the reverse translation of my text into Russian).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *