We estimate the complexity of algorithms in C# in terms of memory and time with examples

We continue to talk about performance and code optimization. Today we’ll talk about how and why to evaluate the complexity of algorithms, and also clearly show how this complexity affects code performance.

In the last article, we looked at what code benchmarking is and how to use it to evaluate the performance of code in .NET. In this article, we’ll talk about assessing the complexity of algorithms. And to make everything clear, together we’ll try to evaluate the algorithms that are used to solve the problem during an interview at Google.

There are a lot of tasks of this kind on platforms like LeetCode, CodeWars and others. And their value is not in learning various sorting algorithms that you will never write on your own in practice, but in seeing typical problems that you may encounter in practical problems during development.

Algorithm complexity assessment

Why evaluate the complexity of algorithms at all and what methods of evaluation are there?

Understanding the complexity of algorithms is important for the following reasons:

  • Without this knowledge, you will not be able to distinguish sub-optimal code from optimal code and optimize it;

  • Every medium or large project will sooner or later operate with a large amount of data. It is important that your algorithms provide for this and do not become a time bomb;

  • if you don't understand the concept of estimating the complexity of algorithms, chances are you'll be writing low-performance code.

More often they talk about evaluating an algorithm by time (time complexity) – how much time it will take to complete it. Execution time depends on the number of elementary operations that the computer performs.

For each algorithm, several complexity estimates can be made:

O big (O(f)) — allows you to estimate the upper bound on the complexity of algorithms. This is the ratio of the amount of input data for an algorithm to the time it takes for the algorithm to process it. In simple words, this is the maximum operating time of the algorithm when working with large volumes of data.

Omega (Ω(f)) – allows you to estimate the lower bound of complexity – how long the algorithm will take in the best case.

Theta (Θ(f)) allows us to obtain a “dense” complexity estimate, that is, one in which the speed of work in the worst and best cases will be proportional to one function. Not applicable to all algorithms.

IT companies often pay attention to Big O (Big-O notation), since most often it is enough to imagine how quickly the number of operations will grow with increasing input data in the worst case. Other types of assessments are rarely used.

In parentheses after O indicate the function that limits the complexity. In this case, n is the size of the input data. For example, O(n) means that the complexity of the algorithm grows linearly. In this case, the execution time of the algorithm increases in direct proportion to the size of the input data.

If you imagine a graph of common algorithm difficulties, they would look like this:

If we conditionally divide difficulties into zones, then difficulties of the form O(log n), O(1) or O(C) can be classified as “Excellent”. Such algorithms, regardless of the volume of data, will execute very quickly—almost instantly.

Algorithms of O(n) complexity can be classified as “Medium” – algorithms whose complexity grows predictably and linearly. For example, if your algorithm processes 100 elements in 10 seconds, then it will process 1000 in approximately 100 seconds. Not the best result, but predictable.

Algorithms from the red zone with complexity O(n2) and higher are difficult to classify as high-performance. BUT! here everything greatly depends on the volume of input data. If you are sure that you will always have a small amount of data (for example, 100 elements), and it will be processed in a time acceptable to you, then everything is in order, such algorithms can also be used. But if you are not sure of the constant volume of data (there may be 10,000 elements instead of 100), it is better to think about optimizing the algorithms.

Assessing difficulty from memory

The complexity of algorithms should be assessed not only by execution time, but also by memory consumption – this is often forgotten when studying the topic.

For example, to speed up calculations, you can create some kind of intermediate data structure such as an array or stack to speed up the algorithm and cache the results. This will lead to additional memory costs, but can significantly speed up calculations.

Memory complexity is also called space complexity and is evaluated using the same notation as for time – big O. For example, memory complexity O(n2), means that in the worst case the algorithm will not need more memory than is proportional to n2 .

It is important to note that when assessing the complexity of memory-based algorithms, a simplified model of working with memory, the so-called RAM machine, is used. This means that we can read and write any memory cell in a single operation. This makes the computational and memory times the same, making analysis easier. This is closest to working with RAM, and, for example, does not take into account processor registers, disk work, garbage collector and much more.

A little practice: rules for calculating difficulty on your fingers

We provide examples in C#, although pseudocode could be used here. We hope they will be clear.

Example 1:

Let's start with a simple algorithm for assigning a variable:

private void Alg1(int[] data, int target)
  {
    var a:int = data[target];
  }

What is its complexity in terms of time and memory?

On the one hand, we can be misled by the array of data data unknown dimension. But it would be incorrect to take it into account when assessing the complexity of the internal algorithm.

Rule 1: External data is not taken into account in the complexity of the algorithm.

It turns out that our algorithm consists of only one line:

var a = data[target];

Accessing an array element by index is a well-known operation with O(1) or O(C) complexity. Accordingly, the entire algorithm will take us O(1) in time.
Additional memory is allocated only for one variable. This means that the amount of data that we will transmit (at least 1,000, at least 10,000) will not affect the final result. Accordingly, our memory complexity is also O(1) or O(C). Such algorithms are called in-place algorithms. They can use additional memory, but its size does not depend on the amount of input data.

To simplify the writing below, I will write O(C) instead of O(1) everywhere; C in this case is a constant. It can be 1, 2 or even 100 – for modern computers this number is not important, since both 1 and 100 operations are performed in almost the same time.

Example 2:

Let's consider the second algorithm, very similar to the first:

private vpid Alg2(int[]data, int target)
  {
  var a:int = data[target];
  var b:int = data[target + 1];
  }

Does the size of the input data array affect the number of operations in it? No.

What about allocated memory? – Also no.

The complexity of this algorithm in terms of execution time could be estimated as O(2*C) – since we perform twice as many operations as in the previous example, 2 assignments instead of 1. But we have a rule for this too:

Rule 2: omit constant factors if they do not dramatically affect the result.

If we take this rule into account, the complexity of this algorithm will be the same as in the first example – O(C) in time and O(C) in memory.

Example 3:

In the third example, we will add a data processing loop to our algorithm:

private int Alg3(int[] data)
{
  var sum = 0;
  for (int i = 0; i < data.Lenght; i++)
    {
    usum += data[i];
    }
  
  return sum;
}

As we can see, the number of operations that the loop will perform directly depends on the amount of input data: more elements in data – more processing cycles will be required to obtain the final result.

It would seem that if we took into account every line of code of our algorithm, we would get something like this:

private int Alg3(int[] data)
  {
    var sum = 0; // O(C)
  for (i , data.Lenght; i++) // O(n)
    {
      sum += data[i];//O(C)
    }
  return sum;
  }

And then the final complexity of the algorithm will be O(C)+O(n). But here a new rule intervenes again:

Rule 3: Omit evaluation elements that are less than the maximum complexity of the algorithm.

Let me explain: if you have O(C)+O(n), the resulting complexity will be O(n), because. O(n) will always grow faster than O(C).

Another example is O(n)+O(n2). With such complexity, n2 always grows faster than N, which means we discard O(n) and only O(n2) remains.

So, the complexity of our third example is O(n). From memory without changes – O(C).

Example 4:

In the fourth example, we calculate the sum of all possible pairs of values ​​from the array:

private int Alg4(int[] data)
{
  var sum = 0;
  for (int i = 0; i ‹ data.Length; i++)
  {
    for (int j = 0; j ‹ data.Length; j++)
    {
      sum += data[i]*data[j];
    }
  }
  return sum;
}

And to process it we now need two cycles. Both of these loops will depend on the dimension of the input data data.

Rule 4: Nested complexities are multiplied.

The complexity of the outer loop in our example is O(n), the complexity of the inner loop is O(n). According to the rule, these two difficulties should be multiplied. As a result, the maximum complexity of the entire algorithm will be equal to O(n2). From memory without changes – O(C).

An example with a trick:

Let's input a two-dimensional array and calculate the sum of the values.

private int Alg4(int[][] data)
  {
    var sum = 0;
    for (int i = 0; i < dana.Lenght; i++)
      {
      fir (int j = 0; j < data[i].Lenght; j++)
        {
        sum += data[i][j];
        }
      }
  
  return sum;
  }

Again we see nested loops – and if there are N x M elements in the input array, then there is a complexity of O(N * M), and it seems that this is equivalent to O(n2). But in fact, no – in this case, N * M elements are simply supplied to the input and the time is proportional to N * M, then linear O(n).

Example 5:

What do we see here? The cycle is already known to us complexity – O(n). But inside the function is called Alg4() from the previous example:

private int Alg5(int[] data)
{
var sum = 0;
for (int i = 0; i ‹ data.Length; i++)
{
sum += Alg4(data);
}
return sum;
}

If we remember its complexity is O(n2), and also rule 4, then we find that the complexity of this algorithm is O(n3) despite its visual minimalism. With time complexity unchanged.

Rule 5: Include estimates of all nested function calls in your overall algorithm complexity estimate.

This is why it is important to understand the complexity syntactic sugar methods like LINQ, core collections and data types – so that it is possible to evaluate how the method will behave as the volume of data increases. If you use them in code without understanding the internal structure, you risk getting a very high complexity algorithm that will choke as the incoming data increases.

I will give an example of a minimalistic algorithm that looks good and compact (in no way pretends to be a reference code), but will become a time bomb when working with large amounts of data:

private List<int> Alg6(int[] data)
  {
    List<int> dups = new List<int>();
    for (var i = 0; i < data.Lenght; i++)
      {
      var currentItem:int = data[i];
      var newArr:int[] = data.Skip(i + 1).ToArray();
      var duplicates:IEnumerable<int> = newArr.Where(x:int => newArr.Contains(currentItem));
      dups.AddRange(duplicates);
      }

  return dups;
  }

What do we see here? Loop = O(n), Where = O(n), Contains = O(n), since newArr is an array.

The total complexity of such an algorithm will be O(n3) in execution time.

ToArray() will allocate additional memory at each iteration for a copy of the array, which means the memory complexity will be O(n2).

Problem from Google

For our final assessment, let's take a problem that is given during an interview at Google.

A detailed analysis of the problem and solution algorithms can be found here in this video from Sasha Lukin. I recommend watching it before continuing reading this article.

Briefly, the goal of the algorithm is to find any two numbers in the sorted array that in total would give us the desired target number.

Solution 1: Complete array traversal

public static int[] FindPairWithFullWalkthrough(int[] data, int target)
  {
  for (int i = 0; i < dataLenght; i++)
    {
    for (int j=i+1; j < data.Lenght; j++)
      {
      if (data[i] + data[j] == target)
        return new[] { data[i], data[j] };
      }
    }

  retirn new int[0];
  }

Time difficulty: O(n2)

Difficulty from memory: O(C)

This is a head on decision. Not the most optimal, since the time increases quickly as the number of elements increases, but we do not consume much additional memory.

Solution 2: use HashSet

public static int[] FindPairUsingHashSet(int[] data, int target)
  {
  HashSet<int> set = new Hashset<int>();
  for (int i = 0; i < data.Lenght; i++)
    {
    int numberToFind = target - data[i];
    if (set.Contains(numberToFind))
      return new [] { data[i], numberToFind };
    set.Add(data[i]);
    }
  return new int[0]
  }

We go through the array and add the elements that we have already checked to the HashSet. If the HashSet contains the element missing for the sum, then everything is fine, we are finished and can return the result. Adding and searching to a HashSet is done in O(C) time.

Time difficulty: O(n)

Difficulty from memory: O(n)

This is just an example of how you can gain performance by allocating additional memory for intermediate structures.

Solution 3: use binary search

public static int[] FindPairUsingBinarySearch(int[] data, int target)
  {
  for (int i = 0; i < data.Lenght; i++)
    {
      int numberToFind = target - data[i];
      int left = i + 1;
      in right = data.Lenght - 1;
      while (left <= right)
        {
        int mid = left + (right - left) / 2;
        if (data[mid] == numberToFind)
          {
          return new[] { data[i], data[mid] };
          }
        if (numberToFind < data[mid])
          {
          right = mid - 1;
          }
        else
          {
          left = mid + 1;
          }
        }
    }
  return new int[0];
  }

The binary search algorithm has a known complexity – O(log(n)). O(n) was added to us from the outer for loop, and everything inside the while loop is the binary search algorithm. According to Rule 4 difficulties multiply.

Time difficulty: O(n*log(n))

Difficulty from memory: O(C)

Solution 4: use the two pointer method

public stati int[] FindPairUsingTwoPointersMethod(int[] data, int target)
  {
  int left = 0;
  int right = data.Lenght - 1;
  while (left < right)
    {
    int sim = data[left] + data[right];
    if (sum == target) return new[] { data[left], data[right] };
    {
      left++;
    }
    else
      {
      right--;
      }
    }

  return new int[0];
  }

We move the left and right pointers towards the center until they converge or there is a pair of values ​​that suit us.

Time difficulty: O(n)

Difficulty from memory: O(C)

And this is the most optimal of all solution options, not using additional memory and performing the least number of operations.

Benchmarking solutions

Now, knowing the complexity of all four solution options, let's try to benchmark this code and see how the algorithms will behave on different data sets. The information from our previous article will help with this.

The results will be as follows:

What do we see here?

For the baseline solution we take a direct pass to the array FindPairWithFullWalkthrough. It works on 10 objects in an average of 20 nanoseconds and takes second place in performance.

Only our most optimal solution option is executed faster on a small amount of data – FindPairUsingTwoPointersMethod.

Option with HashSet took 8 times longer to work with a small amount of data and required the allocation of additional memory, which would then need to be collected by Garbage Collector.

For 1,000 elements, an option with a complete pass through the cycle (FindPairWithFullWalkthrough) began to noticeably lag behind other algorithms. The reason for this is its complexity O(n^2), which grows noticeably faster than all other complexities.

On 10,000 elements, the algorithm with a full pass took 9.7 seconds to complete the calculations, while the rest did it in 0.1 seconds or less. And the most optimal solution was found in 3 milliseconds.

Why did binary search overtake HashSet? After all, in theory, O(n * log(n)) should be slower than O(n)? The fact is that on real, not theoretical computers, allocation and release do not work in the same amount of time – the Garbage collector is turned on every now and then. This is confirmed by the standard deviation (StdDev) values ​​in the HashSet benchmark.

Conclusion

We figured out how to evaluate the complexity of algorithms, and also learned how to use BenchmarkDotNet to trace the relationship between the complexity of algorithms and code execution time. This will allow you to roughly assess whether you have written good code or not, even before benchmarking.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *