Parallel std::thread array sorting method

“The future belongs to parallel computing”. MaksSun

Introduction

The golden days have come to an end, when developers could do nothing, and the software worked every year faster and faster.

Parallelization at the data level. The principle of “Divide and Conquer”.

Direct sequential sorting algorithms are complex enough to be parallelized. Therefore, they resort to the “divide and conquer” strategy.

The principle of “divide and conquer” is one of the fundamental strategies in the development of parallel algorithms. It consists in dividing the problem into smaller subtasks, the solution of which occurs independently, and then combining the results of these subtasks to obtain the final result.

The principle of “divide and conquer” is widely used in parallel computing to increase the performance and efficiency of tasks. It is based on the idea that a complex task can be efficiently solved by breaking it down into several simpler subtasks that can be executed in parallel.

The divide-and-conquer parallel parallelization process includes the following steps.

  1. Separation. The original task is divided into several smaller and independent subtasks. This can be done by data partitioning, search space decomposition, or other methods, depending on the specific task.

  2. domination. Each subtask is solved independently and in parallel. Each subtask can run on its own processor, core, or node in a compute cluster.

  3. An association. The results of the subtasks are combined to obtain the final result. This step may include merging data, combining results, or other operations, depending on the nature of the task.

Divide and Conquer allows you to parallelize tasks that have a well-defined structure and can be broken down into independent subtasks. It enables the efficient use of parallel computing resources and speeds up the execution of tasks, especially for large amounts of data or computationally complex algorithms.

The essence of the Divide and Conquer method

Software implementation

Stage 1

The fastest sorting algorithm (that I know) is “Quicksort” use it for local sorting.

// Функция разделяй и властвуй для сортировки слиянием с использованием потоков
template <class vec, class typ>
void parallelDivideAndConquerMergeSort(vec& array)
{
	typ size = array.size();
	typ numThreads = std::thread::hardware_concurrency();
	typ chunkSize = size / numThreads;

	std::vector<std::thread> threads;
	for (typ i = 0; i < numThreads; ++i) {
		typ start = i * chunkSize;
		typ end = (i == numThreads - 1) ? size - 1 : start + chunkSize - 1;
		threads.emplace_back(quickSort<vec, typ>, std::ref(array), start, end);
	}

	for (auto& thread : threads) {
		thread.join();
	}

	typ step = chunkSize;
	while (step < size) {
		for (typ i = 0; i < size - step; i += 2 * step) {
			typ left = i;
			typ mid = i + step;
			typ right = std::min<typ>(i + 2 * step, size);

			std::inplace_merge(array.begin() + left, array.begin() + mid, array.begin() + right);
		}
		step *= 2;
	}
}

Stage 2

For parallelization we use threads (std::thread).

// Функция разделяй и властвуй для быстрой сортировки с использованием потоков
template <class vec, class typ>
void parallelDivideAndConquerMergeSort(vec& array) {
    typ size = array.size();
    typ numThreads = std::thread::hardware_concurrency();
    typ chunkSize = size / numThreads;

    std::vector<std::thread> threads;
    for (typ i = 0; i < numThreads; ++i) {
        typ start = i * chunkSize;
        typ end = (i == numThreads - 1) ? size - 1 : start + chunkSize - 1;
        threads.emplace_back(quickSort<vec, typ>, std::ref(array), start, end);
    }

    for (auto& thread : threads) {
        thread.join();
    }

    typ step = chunkSize;
    while (step < size) {
        for (typ i = 0; i < size - step; i += 2 * step) {
            typ left = i;
            typ mid = i + step - 1;
            typ right = std::min<typ>(i + 2 * step - 1, size - 1);

            merge<vec, typ>(array, left, mid, right);
        }
        step *= 2;
    }
}

Stage 3

main function

int main() {
	SetConsoleCP(1251);
	SetConsoleOutputCP(1251);

	for (size_t i = 1; i <= 10; ++i)
	{
		size_t size_vector = i * (size_t)1e7;
		std::vector<int> arr(size_vector);
		std::iota(arr.begin(), arr.end(), 0);
		std::mt19937_64 urng{ 121216 };
		std::shuffle(arr.begin(), arr.end(), urng);

		auto start = std::chrono::steady_clock::now();
		//quickSort(arr, (size_t) 0, size_vector);
		parallelDivideAndConquerMergeSort<std::vector<int>, long long>(arr);
		auto end = std::chrono::steady_clock::now();

		std::cout << "Время выполнения: "
			<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
			<< " миллисекунд" << std::endl;

		if (std::is_sorted(arr.begin(), arr.end()))
			std::cout << "Массив отсортирован";
		else
			std::cout << "Не отсортирован массив";
		std::cout << std::endl;

	}

	return 1;
}

Computational experiments

The calculations were carried out on computer system No. 1 (VS No. 1): a personal computer with a six-core AMD Ryzen 5 PRO 4400 GHz processor with Radeon Graphics 3.70 GHz, 32 GB of RAM, Windows 10×64 operating system, 128 GB SSD.

Execution time of the sequential Quicksort algorithm.

Table 1 shows the results of sequential quick sorting (Quick sort) by the “divide and conquer” principle, obtained on computer system No. 1 (6 cores) using the technology of threads (std::thread).

Table 1.

Table 1.

Table 2 shows the results of parallel quick sorting (Quick sort) by the “divide and conquer” principle, obtained on computer system No. 1 (6 cores) using the technology of threads (std::thread).

Conclusion

Acceleration reached up to 3 times inclusive.

All code

#include <iostream>
#include <vector>
#include <random>
#include <numeric>
#include <chrono>
#include <thread>
#include <Windows.h>

// Функция для разделения массива на подмассивы с помощью опорного элемента
template <class vec, class typ>
typ partition(vec& arr, typ low, typ high) {
	using VectorType = typename std::remove_reference<decltype(arr)>::type::value_type;
	VectorType pivot = arr[high]; // Выбираем последний элемент в качестве опорного
	typ i = low - 1; // Индекс меньшего элемента

	for (typ j = low; j <= high - 1; ++j) {
		// Если текущий элемент меньше или равен опорному
		if (arr[j] <= pivot) {
			++i; // Увеличиваем индекс меньшего элемента
			std::swap(arr[i], arr[j]);
		}
	}

	std::swap(arr[i + 1], arr[high]);
	return i + 1;
}

// Рекурсивная функция для выполнения быстрой сортировки
template <class vec, class typ>
void quickSort(vec& arr, typ low, typ high) {
	if (low < high) {
		// Индекс опорного элемента после разделения
		typ pivotIndex = partition<vec, typ>(arr, low, high);

		// Рекурсивно сортируем элементы до и после опорного элемента
		quickSort<vec, typ>(arr, low, pivotIndex - 1);
		quickSort<vec, typ>(arr, pivotIndex + 1, high);
	}
}

// Функция разделяй и властвуй для сортировки слиянием с использованием потоков
template <class vec, class typ>
void parallelDivideAndConquerMergeSort(vec& array)
{
	typ size = array.size();
	typ numThreads = std::thread::hardware_concurrency();
	typ chunkSize = size / numThreads;

	std::vector<std::thread> threads;
	for (typ i = 0; i < numThreads; ++i) {
		typ start = i * chunkSize;
		typ end = (i == numThreads - 1) ? size - 1 : start + chunkSize - 1;
		threads.emplace_back(quickSort<vec, typ>, std::ref(array), start, end);
	}

	for (auto& thread : threads) {
		thread.join();
	}

	typ step = chunkSize;
	while (step < size) {
		for (typ i = 0; i < size - step; i += 2 * step) {
			typ left = i;
			typ mid = i + step;
			typ right = std::min<typ>(i + 2 * step, size);

			std::inplace_merge(array.begin() + left, array.begin() + mid, array.begin() + right);
		}
		step *= 2;
	}
}

int main() {
	SetConsoleCP(1251);
	SetConsoleOutputCP(1251);

	for (size_t i = 1; i <= 10; ++i)
	{
		size_t size_vector = i * (size_t)1e7;
		std::vector<int> arr(size_vector);
		std::iota(arr.begin(), arr.end(), 0);
		std::mt19937_64 urng{ 121216 };
		std::shuffle(arr.begin(), arr.end(), urng);

		auto start = std::chrono::steady_clock::now();
		//quickSort(arr, (size_t) 0, size_vector);
		parallelDivideAndConquerMergeSort<std::vector<int>, long long>(arr);
		auto end = std::chrono::steady_clock::now();

		std::cout << "Время выполнения: "
			<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
			<< " миллисекунд" << std::endl;

		if (std::is_sorted(arr.begin(), arr.end()))
			std::cout << "Массив отсортирован";
		else
			std::cout << "Не отсортирован массив";
		std::cout << std::endl;

	}

	return 0;
}

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *