Python – list

An article from my telegram channel about programming.

Collection generators – a short (relative to the for loop) way to create collections based on other collections.

These generators allow us to:

  • Briefly and simply create collections (with simple logic).

  • Save time (generators are more efficient than a for loop).

  • Suitable for adherents of functional programming, since it is the generation of a new collection, and not the modification of an existing one.

I want to immediately mention one important thing – it’s not worth it”hide” Important business rules in complex collection generators:

  • Not everyone can quickly understand what is happening in the generator. Not everyone knows how to write collection generators well. Previously, when completing tasks on sites like CodeWars, I also thought “How cool it is that you can cram a lot of things into one line,” and even strived for this. Now I don't think so.

  • When business logic becomes more complex, there is a high chance that the person making the changes will try to preserve the generator expression rather than give preference to rewriting everything into stages using for in order to make the code more explicit. As a result, the generator expression may become even more complex, or the logic may settle next to this expression at best (it will begin to blur throughout the code, what should be executed together)

# Функциональное выражение генератор взятое из боевого кода, 
# которое и по сей день кажется мне идеальным примером, 
# когда бизнес логика замешана с попыткой сделать код проще(меньше). 
# Попытка, как по мне, неудачна :)

discounts = (
	load_result.map(self._collect_discounts)
	.rescue(lambda ex: Success([None] * len(self._products)))
	.unwrap()
)

Let's move on to examples:

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

"Создание списка через стандартный цикл for"
numbers_squares = []
for number in numbers:
    numbers_squares.append(number ** 2)

"Создание списка с помощью генератора списков"
numbers_squares_ = [number ** 2 for number in numbers]

print(numbers_squares)  # -> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
print(numbers_squares_)  # -> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Indeed, it is compact. What if the logic is more complicated?

students = {
    1: {
        "age": 27,
        "first_name": "Mark",
        "last_name": "Loginov",
        "subject_average_score": {
            "history": 4.5,
            "mathematics": 3.4,
        },
    },
    2: {
        "age": 32,
        "first_name": "Igor",
        "last_name": "Petrov",
        "subject_average_score": {
            "history": 4.2,
            "literature": 5,
        },
    },
}

students_ = {
    id: {
        "full_name": f"{student_data['first_name']} {student_data['last_name']}",
        "subject": [
            subject
            for subject in student_data["subject_average_score"].keys()
        ],
    }
    for id, student_data in students.items()
}

print(students_)
{
	1: {'full_name': 'Mark Loginov', 'subject': ['history', 'mathematics']}, 
	2: {'full_name': 'Igor Petrov', 'subject': ['history', 'literature']},
}

# Имея информацию о студентах, мы можем сделать выжимку и сжать данные до 
# размера: что это за студент, и какие уроки он посещает.
numbers = [
    [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    [[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]],
    [[22, 23, 24], [25, 26, 27]],
]

numbers_split = [
    number for numbers_level_two in numbers
    for numbers_level_three in numbers_level_two
    for number in numbers_level_three
]

print(numbers_split)  # -> [1, 2, 3, 4, 5, 6, 7, 8, ..., 27]

# Если нам известна вложенность нашей структуры, то мы можем сделать 
# из неё линейную последовательность.
peoples = [
    {"name": "", "age": 29},
    {"name": "Igor", "age": 27},
    {"name": "Petr", "age": 31},
    {"name": "Liza", "age": 20},
]
filtered_peoples_names = [
    people["name"] if people.get("name") else "Unknown Person"
    for people in peoples
    if people["age"] < 30
]

print(filtered_peoples_names)  # -> ['Unknown Person', 'Igor', 'Liza']

# Также обратите внимание, что if в конце служит для фильтрации данных, 
# а if, else в начале для возможности выбора конечного действия над 
# выбранным объектом. В этом примере мы убрали из конечной выборки всех 
# кому менее 30 лет. Тех, у кого не было внесено имя, установили 
# его в "Unknown Person".

Let's say we've figured out the basic syntax of the list generator, but what about speed?

def for_() -> list[int]:
    numbers_squares = []
    for i in range(100):
        numbers_squares.append(i)
    return numbers_squares

def list_comprehension() -> list[int]:
    return [i for i in range(100)]


# python 3.10
print(min(timeit.repeat(list_comprehension, number=100000)))  # -> ~0.1477
print(min(timeit.repeat(for_, number=100000)))  # -> ~0.2755

# python 3.12
print(min(timeit.repeat(list_comprehension, number=100000)))  # ~0.0841
print(min(timeit.repeat(for_, number=100000)))  # ~0.1155

Notice the difference in speed between generating a list and a loop:

  • python 3.10 ~87%

  • python 3.12 ~37%

If you have a question why the difference in speed between versions has decreased so much python? IN python 3.12 productivity has increased significantly relative to python 3.10 in such cases:

  • list_comprehension ~75%

  • for ~139%

Let's increase the volume of generated data by 10 times and take measurements again:

  • python 3.10 ~40%

  • python 3.12 ~18%

  • list_comprehension ~82%

  • for ~116%

What causes the increase in speed?

  • Using a loop, we have to do at each iteration __getattribute__ And call method append.

  • The list created in the loop in advance does not know how many objects will be in it. Therefore, with a large set of data, it will be repeatedly “removed” from RAM, allocated a new increased volume and “inserted” into a new location. This is a fairly expensive operation that takes O(n) time.

If you're interested in a deeper dive into what's going on “under the hood”then I suggest running code similar to this in your interpreter:

import dis


def for_() -> list[int]:
    numbers_squares = []
    for _ in range(100):
        numbers_squares.append(_)
    return numbers_squares

def list_comprehension() -> list[int]:
    return [_ for _ in range(100)]


print(dis.dis(for_))
print(dis.dis(list_comprehension))

You will receive parsed machine code in assembler language (the code reflects the actions of the processor):

Once you have data from both examples, you can compare how these approaches differ.

Once you have data from both examples, you can compare how these approaches differ.

In conclusion, I would like to mention that in addition to list generators, there are also:

  • Set generator

  • Dictionary generator

  • Generator generator 🙂

They all work according to the same principle and the same rules.

print({number for number in [1, 2, 1]})  # -> {1, 2}
print({name: value for name, value in zip(["one", "two"], [1, 2])})  # -> {'one': 1, 'two': 2}
print((x for x in range(10)))  # -> <generator object <genexpr> at xxx>

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *