Studying known sorting algorithms

The purpose of this laboratory work is to look at algorithms with different asymptotics, learn to analyze the running time of algorithms and enable different degrees of optimization.

Bubble sort:

Algorithm:

We take the very first element of the array and compare it with the second. If the first is greater than the second, we swap them with the first; if not, we do nothing. Then we take the second element of the array and compare it with the next one – the third. If the second is greater than the third, we swap them; if not, we do nothing. We go like this until the penultimate element, compare it with the last one and put the largest of them at the end of the array. That's it, we found the largest number in the array and put it in its place. We return to the beginning of the algorithm and do everything again in exactly the same way, starting with the first and second elements. Only now we give ourselves the task of not checking the last element – we know that now the largest element is at the end of the array. When we finish the next pass, we reduce the value of the final position to which we check, and start again from the beginning. We do this until we have one element left.

Asymptotics: O(N^2)

Sorting code with graph output:

import random
import time
import matplotlib
matplotlib.use('TkAgg')  # Используйте соответствующий бэкенд
import matplotlib.pyplot as plt
from collections import defaultdict

# Пузырьковая сортировка
def sort_array(arr):
    for i in range(len(arr)):
        for j in range(i + 1, len(arr)):
            if arr[j] <= arr[i]:
                continue
            arr[i], arr[j] = arr[j], arr[i]

# Заполнение массива случайными значениями
def fill_array(arr):
    for i in range(len(arr)):
        arr[i] = random.randint(-len(arr), len(arr))

# Подсчет времени выполнения функции
def running_time(action, arr):
    start_time = time.time()
    action(arr)
    return (time.time() - start_time) * 1000  # Время в миллисекундах

def display_statistics():
    results = defaultdict(int)

    # Сбор результатов
    for i in range(500, 5050, 50):
        ar = [0] * i
        fill_array(ar)
        results[i] = running_time(sort_array, ar)

    # Обработка данных для графика
    sizes = list(results.keys())
    times = list(results.values())

    # Построение основного графика
    plt.figure(figsize=(12, 6))

    # График с обычным масштабом
    plt.subplot(1, 2, 1)  # 1 строка, 2 колонки, 1-й график
    plt.plot(sizes, times, label="Sort Productivity", marker="o")
    plt.title('Время сортировки в зависимости от размера массива')
    plt.xlabel('Размер массива')
    plt.ylabel('Время сортировки, мс')
    plt.legend()
    plt.grid()

    # Построение графика с двойным логарифмическим масштабом
    plt.subplot(1, 2, 2)  # 1 строка, 2 колонки, 2-й график
    plt.plot(sizes, times, label="Sort Productivity (log-log)", marker="o")
    plt.title('Время сортировки (лог-лог масштаб)')
    plt.xlabel('Размер массива (лог)')
    plt.ylabel('Время сортировки (лог)')
    plt.xscale('log')
    plt.yscale('log')
    plt.legend()
    plt.grid()

    plt.tight_layout()  # Компактное размещение графиков
    plt.show()

if __name__ == "__main__":
    display_statistics()

Insertion sort O(N2):

Algorithm: We go through the array from left to right and process each element in turn. To the left of the next element we increase the sorted part of the array; to the right, as the process progresses, the unsorted part slowly evaporates. In the sorted part of the array, the insertion point for the next element is searched. The element itself is sent to the buffer, as a result of which a free cell appears in the array – this allows you to move the elements and free the insertion point.

Asymptotics: O(N^2)

Sorting code with graph output:

import random
import time
import matplotlib
matplotlib.use('TkAgg')  # Используйте соответствующий бэкенд
import matplotlib.pyplot as plt
from collections import defaultdict

# Сортировка вставками
def sort_array(arr):
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        while j >= 0 and key < arr[j]:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key

# Заполнение массива случайными значениями
def fill_array(arr):
    for i in range(len(arr)):
        arr[i] = random.randint(-len(arr), len(arr))

# Подсчет времени выполнения функции
def running_time(action, arr):
    start_time = time.time()
    action(arr)
    return (time.time() - start_time) * 1000  # Время в миллисекундах

def display_statistics():
    results = defaultdict(int)

    # Сбор результатов
    for i in range(500, 5050, 50):
        ar = [0] * i
        fill_array(ar)
        results[i] = running_time(sort_array, ar)

    # Обработка данных для графика
    sizes = list(results.keys())
    times = list(results.values())

    # Построение основного графика
    plt.figure(figsize=(12, 6))

    # График с обычным масштабом
    plt.subplot(1, 2, 1)  # 1 строка, 2 колонки, 1-й график
    plt.plot(sizes, times, label="Sort Productivity", marker="o")
    plt.title('Время сортировки в зависимости от размера массива')
    plt.xlabel('Размер массива')
    plt.ylabel('Время сортировки, мс')
    plt.legend()
    plt.grid()

    # Построение графика с двойным логарифмическим масштабом
    plt.subplot(1, 2, 2)  # 1 строка, 2 колонки, 2-й график
    plt.plot(sizes, times, label="Sort Productivity (log-log)", marker="o")
    plt.title('Время сортировки (лог-лог масштаб)')
    plt.xlabel('Размер массива (лог)')
    plt.ylabel('Время сортировки (лог)')
    plt.xscale('log')
    plt.yscale('log')
    plt.legend()
    plt.grid()

    plt.tight_layout()  # Компактное размещение графиков
    plt.show()

if __name__ == "__main__":
    display_statistics()

Shaker sort O(N2):

Algorithm: Shaker sort differs from bubble sort in that it is bidirectional: the algorithm does not move strictly from left to right, but first from left to right, then from right to left.

Asymptotics: O(N^2)

Sorting code with graph output:

import random
import time
import matplotlib
matplotlib.use('TkAgg')  # Используйте соответствующий бэкенд
import matplotlib.pyplot as plt
from collections import defaultdict

# Сортировка шейкером
def sort_array(arr):
    n = len(arr)
    swapped = True
    start = 0
    end = n - 1

    while swapped:
        swapped = False
        # Проход вперед
        for i in range(start, end):
            if arr[i] > arr[i + 1]:
                arr[i], arr[i + 1] = arr[i + 1], arr[i]
                swapped = True
        end -= 1

        if not swapped:
            break

        swapped = False
        # Проход назад
        for i in range(end, start, -1):
            if arr[i] < arr[i - 1]:
                arr[i], arr[i - 1] = arr[i - 1], arr[i]
                swapped = True
        start += 1

# Заполнение массива случайными значениями
def fill_array(arr):
    for i in range(len(arr)):
        arr[i] = random.randint(-len(arr), len(arr))

# Подсчет времени выполнения функции
def running_time(action, arr):
    start_time = time.time()
    action(arr)
    return (time.time() - start_time) * 1000  # Время в миллисекундах

def display_statistics():
    results = defaultdict(int)

    # Сбор результатов
    for i in range(500, 5050, 50):
        ar = [0] * i
        fill_array(ar)
        results[i] = running_time(sort_array, ar)

    # Обработка данных для графика
    sizes = list(results.keys())
    times = list(results.values())

    # Построение основного графика
    plt.figure(figsize=(12, 6))

    # График с обычным масштабом
    plt.subplot(1, 2, 1)  # 1 строка, 2 колонки, 1-й график
    plt.plot(sizes, times, label="Sort Productivity", marker="o")
    plt.title('Время сортировки в зависимости от размера массива')
    plt.xlabel('Размер массива')
    plt.ylabel('Время сортировки, мс')
    plt.legend()
    plt.grid()

    # Построение графика с двойным логарифмическим масштабом
    plt.subplot(1, 2, 2)  # 1 строка, 2 колонки, 2-й график
    plt.plot(sizes, times, label="Sort Productivity (log-log)", marker="o")
    plt.title('Время сортировки')
    plt.xlabel('Размер массива (лог)')
    plt.ylabel('Время сортировки (лог)')
    plt.xscale('log')
    plt.yscale('log')
    plt.legend()
    plt.grid()

    plt.tight_layout()  # Компактное размещение графиков
    plt.show()

if __name__ == "__main__":
    display_statistics()

Heap sort:

Algorithm: Heapsort algorithm in ascending order:

  1. Construct max-heap from input data.

  2. At this stage, the largest element is stored at the root of the heap. Replace it with the last element of the heap and then reduce its size by 1. Finally, convert the resulting tree to a max-heap with a new root.

  3. Repeat the above steps as long as the heap size is greater than 1.

How to build a heap?

The heapify procedure (heapify procedure) can be applied to a node only if its child nodes have already been converted. So the conversion must be done from bottom to top. Let's understand with an example:

Input data: 4, 10, 3, 5, 1

4(0)

/\

10(1) 3(2)

/\

5(3) 1(4)

The numbers in parentheses represent indices in the array representation of the data.

Applying heapify to index 1:

4(0)

/\

10(1) 3(2)

/\

5(3) 1(4)

Applying heapify to index 0:

10(0)

/\

5(1) 3(2)

/\

4(3) 1(4)

The heapify routine calls itself recursively to create the heap from top to bottom.

Asymptotics: O(N*log(N))

Sorting code with graph output:

import random
import time
import matplotlib
matplotlib.use('TkAgg')  # Используйте соответствующий бэкенд
import matplotlib.pyplot as plt
from collections import defaultdict

# Сортировка кучей
def heapify(arr, n, i):
    largest = i  # Инициализируем максимальный элемент как корень
    left = 2 * i + 1  # Левый дочерний элемент
    right = 2 * i + 2  # Правый дочерний элемент

    # Если левый дочерний элемент больше корня
    if left < n and arr[left] > arr[largest]:
        largest = left

    # Если правый дочерний элемент больше текущего максимума
    if right < n and arr[right] > arr[largest]:
        largest = right

    # Если находимся не в правильной позиции, меняем местами
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]  # Меняем местами
        heapify(arr, n, largest)  # Рекурсивно преобразуем подкучу в кучу

def sort_array(arr):
    n = len(arr)

    # Строим кучу (перегружаем по убыванию)
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)

    # Один за другим извлекаем элементы из кучи
    for i in range(n - 1, 0, -1):
        arr[i], arr[0] = arr[0], arr[i]  # Меняем местами
        heapify(arr, i, 0)

# Заполнение массива случайными значениями
def fill_array(arr):
    for i in range(len(arr)):
        arr[i] = random.randint(-len(arr), len(arr))

# Подсчет времени выполнения функции
def running_time(action, arr):
    start_time = time.time()
    action(arr)
    return (time.time() - start_time) * 1000  # Время в миллисекундах

def display_statistics():
    results = defaultdict(int)

    # Сбор результатов
    for i in range(500, 5050, 50):
        ar = [0] * i
        fill_array(ar)
        results[i] = running_time(sort_array, ar)

    # Обработка данных для графика
    sizes = list(results.keys())
    times = list(results.values())

    # Построение основного графика
    plt.figure(figsize=(12, 6))

    # График с обычным масштабом
    plt.subplot(1, 2, 1)  # 1 строка, 2 колонки, 1-й график
    plt.plot(sizes, times, label="Sort Productivity", marker="o")
    plt.title('Время сортировки в зависимости от размера массива')
    plt.xlabel('Размер массива')
    plt.ylabel('Время сортировки, мс')
    plt.legend()
    plt.grid()

    # Построение графика с двойным логарифмическим масштабом
    plt.subplot(1, 2, 2)  # 1 строка, 2 колонки, 2-й график
    plt.plot(sizes, times, label="Sort Productivity (log-log)", marker="o")
    plt.title('Время сортировки ')
    plt.xlabel('Размер массива (лог)')
    plt.ylabel('Время сортировки (лог)')
    plt.xscale('log')
    plt.yscale('log')
    plt.legend()
    plt.grid()

    plt.tight_layout()  # Компактное размещение графиков
    plt.show()

if __name__ == "__main__":
    display_statistics()

Hoare sort:

At each step of the algorithm, the “middle” element is first selected, then the elements of the array are rearranged so that the array is divided into two parts. The first part contains elements less than the “average” and, possibly, equal to it. The second part contains elements greater than “average” and, possibly, equal to it. After such division of the array, all that remains is to sort its parts separately, with which we proceed in the same way (we divide into two parts). And so on until these parts turn out to consist of one element, and an array of one element is always sorted. In the case when the array contains only identical elements, the “middle” element is not selected and sorting is not performed.

The array is divided into two parts as follows. Place one cursor on the left border of the array, and the second on the right border. Then we move the cursors towards each other until they intersect. When moving the cursors, we compare the values ​​of the current elements with the “average”. We find the left current element, larger than the “middle”, and the right current element, less than the “average” (that is, elements that are “out of place”). We exchange these elements.

Choosing the “average” is not an easy task, since it requires, without sorting, to find the element with a value as close as possible to the average. Here, of course, you can simply select an arbitrary element (usually the element in the middle of the sorted subarray is selected), but let’s go a little further: out of three elements (the leftmost, the rightmost and the one in the middle), we choose the middle one.

Asymptotics: O(N*log(N))

Sorting code with graph output:

Dependence on initial data.

Let's consider the time of each sort and evaluate the dependence of time on the number of elements in the array in the best, arbitrary and worst cases.

Bubble sort:

Insertion sort:

Shaker sorting:

Heap sort:

Hoare sorting:

Merge sort:

Dependency on data type.

Let's try to sort arrays of other data types and compare the running time.

Small arrays.

Let's look at such a non-obvious thing as the operating time of the six previous sorts on small arrays (approximately 1 – 1000 elements). The point is that the asymptotics begins to work for very large N – it is not without reason that in all calculations and estimates we discarded all terms except those indicated in the asymptotics itself. And at small N, these terms and coefficients appear, which can greatly affect the final operating time.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *