Complexity of algorithms and common mistakes in Python

Hi all! I will tell you what the complexity of algorithms is and where it comes from, I will analyze typical misconceptions and the most common mistakes made by beginners. The material is intended primarily for beginner Python developers, as well as for those who use Python as their first programming language.

What is the complexity of algorithms?

When they talk about the complexity of algorithms, they most often give an example of a graph on which functions are drawn, and they say that this algorithm is equal in complexity to this function, and that algorithm is equal to another function, and so on. Typical examples:

  • Log(N) – binary search in an already sorted array;

  • N — code operation in one cycle;

  • N*Log(N) — sorting;

  • N**2 is a loop nested within another loop.

Note: assembler so close to machine code that in fact one machine instruction corresponds to one assembler operator (1:1). In this way, the actual complexity of the algorithm can be estimated fairly accurately.

Obviously, the more complex the algorithm, the faster the function increases. But what does this mean if you dig deeper? Let's use the example of C++ to consider some algorithm with O(N) complexity, for example, creating an array, and see how this operation will look in disassembler.

You can read more about how much each operation costs in the article “Complexity of algorithms and operations using Python as an example

Note: C/C++ is much closer to hardware than high-level Python, so it is much easier to disassemble. Clearly, a list in Python works differently from the way an array works in C++, but the principles of how they work are exactly the same.

To create an array of 7 elements in C++:

int arr[] = {1, 2, 3, 4, 5, 6, 7};

you will need to do 7 operations in assembler:

mov     DWORD PTR [rbp-32], 1
mov     DWORD PTR [rbp-28], 2
mov     DWORD PTR [rbp-24], 3
mov     DWORD PTR [rbp-20], 4
mov     DWORD PTR [rbp-16], 5
mov     DWORD PTR [rbp-12], 6
mov     DWORD PTR [rbp-8], 7

This is exactly what the complexity function of the O(N) algorithm shows – how many elements need to be processed, so many operations will be performed. For O(N*N) it’s the same, but the number of operations will already be squared. In Python, creating a list of 7 elements also requires at least seven operations:

l = [1, 2, 3, 4, 5, 6, 7]

Another example: to write one element of an array in C++:

arr[0] = 111;

Only one operation in assembler is required:

mov     DWORD PTR [rbp-32], 111

This is why this operation has O(1) complexity. Same thing for Python:

l[0] = 111

This operation also has O(1) complexity by the same logic.

Now that it is clear where the complexity of algorithms comes from, we can move on to common mistakes.

Example 1, working with the string type

Let's consider the simplest illustrative example, which arises from a misunderstanding of the definition of the complexity of an algorithm. Let's say we need to collect a string in a loop, for this we can write the following code:

line = ""
for i in range(10_000):
    line += "i"

It would seem like a simple and working code that gives the correct result – in this case, a string of 10 thousand “i” characters. Execution time of this code 5 ms.

At first glance, this is an algorithm with O(N) complexity, but if you look at the last line of code, you can see that the type string does not actually add a character to the current value, but recreates the value with a new character, since string is an immutable type. In other words, this operation alone will have O(N) complexity, because to copy a value to a new string, you need to copy every element of the old string plus the new character. In addition, this operation is also performed in a loop, that is, the final complexity of this algorithm will be equal to O(N*N).

Note: given that the variable line grows sequentially from 0 to N, it would be correct to say that the complexity is O(N*M), where M < N. But for simplicity, we will assume that the complexity of this algorithm is O(N*N). In general, this won't be a big mistake.

This algorithm can be improved by using an operation whose complexity is O(1), for example append:

word = []
for i in range(10_000):
    word.append("i")
line = "".join(word)

Now the algorithm has worked in 2 mswhich is 2.5 times faster than the previous version, and now its complexity is O(N).

It is worth noting that the append operation in Python is actually done in amortized O(1), as in other high level languages ​​such as Java or C#. In other words, it is extremely rare, but this operation can be completed in O(N) time. Under the hood of the list is an array into which the elements are saved. If the list size is not enough, then before saving the new element it will be increased twice, in this case it will be O(N).

Let's also consider an example with an assignment operation, which actually already has O(1) complexity:

N = 10_000
word = [None] * N
for i in range(N):
    word[i] = "i"
line = "".join(word)

Execution time reduced to 1.35 mswhich is slightly faster than with append. Both of these algorithms will be executed in approximately the same time; in practice, the option with assignment will often run just a little faster.

Example 2, array conversions

Let's take a very large list and look for elements in it:

arr = list(range(10_000_000))
matcher = [500_000, 100_000, 1000_000_000] * 10

In our case, the array will consist of 10 million elements, and we will look for 30 numbers in it. In its simplest form, the algorithm will look like this: in a loop we go through each of the 30 elements from matcher and look for him in arr.

for i in matcher:
    if i in arr:
        pass

This algorithm has complexity O(N*N): searching in a loop is O(N), and searching in a list is also O(N). Duration of its work 1.2 seconds.

This code can be improved because we know that set search has O(1) complexity. How can a novice programmer rewrite an algorithm? For example, like this:

for i in matcher:
    if i in set(arr):
        pass

Yes, with pure intentions, a beginner or a programmer who does not know the complexity of algorithms can write like this. This example also has complexity O(N*N), searching through a set is O(1), but converting a list to a set is O(N). Despite the complexity O(N*N), as in the previous example, the algorithm now runs in 15.2 seconds. The reason is that searching through the list in the worst case gives O(N), but the transformation will always be O(N).

It would be correct to write this:

arr_s = set(arr)
for i in matcher:
    if i in arr_s:
        pass

Now the code is executed 0.5 seconds. This error is surprisingly common, it is not noticeable, but it can slow down the algorithm and increase memory consumption. It is important to understand that unnecessary transformation of an array of elements consumes unnecessary resources.

Typical examples with conversion errors:

  • set(list(map(str, arr))) – you need to immediately reduce it to a set: set(map(str, arr))

  • list(sorted(arr))sorted and so it returns a list

And other errors, there can be many variations.

Example 3, errors when declaring variables

It is not without reason that they say that it is advisable to declare variables at the beginning of a method. This is, of course, a recommendation, but it is useful for beginners. Often an inexperienced programmer can make a similar mistake: let's say we have some method that performs a division operation:

def divider(a: int, b: int):
    res = a / b
    return res

Let's assume that a QA engineer (aka tester) on the team found an error in this code: he noticed that it was impossible to divide by zero, filed a bug, and the novice programmer corrected the code:

def divider(a: int, b: int):
    try:
        res = a / b
    except ZeroDivisionError:
        print("don't divide on zero!")
    return res

Yes, this is a fairly common situation when a bug is masked by another error. In this case, division by zero will be processed correctly, but an error will immediately appear: UnboundLocalError: local variable 'res' referenced before assignmentthat is, accessing a non-existent variable, since it will not be initialized in the block try/except due to division error.

The easiest way to avoid this is to declare variables in advance:

def divider(a: int, b: int):
    res = None
    try:
        res = a / b
    except ZeroDivisionError:
        print("don't divide on zero!")
    return res

Variables are often declared in a segment ifcode like this is not good form:

def divider(a: int, b: int, flag: bool):
    if flag:
        res = a / b
    return res

It’s possible to declare a variable like this in Python, but it won’t work in Java or C#. Such code in these programming languages ​​will cause an error at compile time.

Example 4, input parameters

There is a well-known recommendation: do not change the input parameters. Why do they say this, let's see.

def func(l: list):
    l[0] = 10
l = [1, 2, 3]
func(l)

It would seem like a simple code, but what will be in the list after it is executed? Here's what:

print(l)

[10, 2, 3]

How did this happen? After all, the method does not return anything; we do not redefine the list in any way in the code, only inside the method. Let's look at another example:

def func2(a: int):
    a = 10
a = 0
func2(a)

What will the variable be equal to:

print(a)

0

In the second case, the variable has not changed, why does this happen? The fact is that objects in Python are divided into those that are passed by reference – list, set, dictand those that are passed by value – int, float, str, tuple.

Let's look at another example:

l1 = [1, 2]
l2 = l1
l1.append(3)

In this case, both lists will be equal, even though we only changed one of them:

print(l1, l2)

([1, 2, 3], [1, 2, 3])

When passing arguments to a method, or if you need to preserve the original state of the list to avoid errors, reference types must be copied explicitly, for example:

l2 = l1[:]
l2 = list(l1)
l2 = l1.copy()

In this case, the references will be to different memory areas, which will solve the problem.

Example 5, default value

This is probably one of the most common mistakes new programmers make. Let's look at an example:

def func3(val: int, l: list = []):
    l.append(val)
    print(l)

It would seem like a common method, but let's call it several times in a row and see what happens:

func3(1)

[1]

func3(2)

[1, 2]

Yes, this is the catch: a new empty list is not created by default. That is why during subsequent calls, in which the programmer expects an empty array, as laid down by the logic of the algorithm, elements from previous calls to the method remain in the array. That's how Python works: it always takes the same list reference in a case like this.

If you want an empty list by default, you should write it like this:

def func3(val: int, l: Optional[list] = None):
    if not l:
        l = []
    l.append(val)
    print(l)

The same rule applies to the set (set) and dictionary (dict), they also cannot be made empty in the default parameters.

Conclusion

In simple words and with examples, I explained what the complexity of an algorithm is, showed where this concept comes from, and analyzed typical errors that arise from a lack of understanding of complexity and are the most common among beginners.

Don't be afraid to make mistakes if you're a beginner developer. Only those who do nothing make no mistakes.

Thank you for your attention.

The source code repository is on GitHub.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *