Hybrid Sorting

As everyone already knows, sorting can be based on exchanges, inserts, selection, merging, and distribution.

But if different methods are combined in the algorithm, then it belongs to the class of hybrid sorts.

EDISON Software - web-development

This article was written with the support of EDISON.

We are engaged completion and maintenance of sites on 1C-Bitrix, and Android and iOS mobile app development.

We love the theory of algorithms! 😉

Let’s quickly recall what classes sorting algorithms have and what are the features of each of them.

Exchange Sorts

The elements of the array are compared in pairs with each other and exchanges are made for disordered pairs.

The most effective representative of this class is the legendary quick sort.

Insertion Sorts

Elements from the unsorted part of the array are inserted into their places in the sorted area.

Of this class is most often used simple insertion sorting. Although this algorithm has an average complexity of O (n2), this sorting very quickly works with almost ordered arrays – on them complexity reaches O (n) In addition, this sorting is one of the best options for processing small arrays.

Sorting using a binary search tree also applies to this class.

Sort by selection

In the unordered area, the minimum / maximum element is selected, which is transferred to the end / beginning of the unsorted part of the array.

Sorting with a simple choice is very slow (average O (n2)), but in this class there is a difficult heap sort (she pyramidal sorting), which has a time complexity of O (n log n) – and, which is very valuable, this sorting has no degenerate cases, whatever the input data. By the way, this sorting does not have the best cases for incoming data either.

Merge Sorts

The sorted areas are taken in the array and merged, that is, the smaller sorted subarrays are combined into a larger sorted subarray.

If two subarrays are sorted, then combining them is an easily implemented and fast-time operation. The flip side of the coin is that the merge almost always requires the cost of additional memory O (n) – although an extremely small number of particularly sophisticated merge sort options are known, where the memory overhead is O (1).

Sort By Distribution

The elements of the array are distributed and redistributed into classes until the array accepts a sorted state.

Elements are scattered in groups or depending on their value (the so-called counting sort) or depending on the value of individual digits (this is already bitwise sorting)

Nuclear sorting also applies to this class.

A feature of sorting by distribution is that they either do not use pairwise comparisons of elements between themselves or such comparisons are present to an insignificant extent. Therefore, sorting by distribution is often ahead of speed, for example, quick sorting. On the other hand, sorting by distribution often requires a lot of additional memory, since groups of constantly redistributed elements need to be stored somewhere.






Disputes about what sorting is very frequent the best, but the fact of the matter is that there is no and cannot be an ideal algorithm for all occasions. For example, quick sorting is really very fast (but not the fastest) in most situations, but it also comes across degenerate cases in which a crash occurs. Sorting by simple inserts is slow, but for almost ordered arrays it will easily bypass other algorithms. Heap sorting works quite quickly with any incoming data, but not as fast as other sortings under certain conditions and there is no way to speed up the pyramid. Merge sorting is slower than quick sorting, however if there are sorted subarrays in the array, then it is faster to merge them than sorting by quick sorting. If the array has many repeating elements or we sort the rows, then most likely sorting by distribution is the best option. Each method is especially good in its most favorable situation.

Nevertheless, programmers continue to invent the fastest sortings in the world, synthesizing the most effective methods from different classes. Let’s see how well they succeed.

Since many non-trivial algorithms are mentioned in the article, I only briefly cover the basic principles of their work, without overloading the article with animations and detailed explanations. In the future, there will be separate articles, where there will be “cartoons” for each algorithm and detailed subtle nuances.






Insert + Merge

A purely empirical conclusion is that fusion and / or insertion are most often used in hybrids. In most sortings, either one or the other method is found, or both together. And there is a logical explanation for this.

Sort inventors often strive to create parallel algorithms that simultaneously order different parts of an array. The best way to deal with several sorted subarrays is to merge them – this will be the fastest.

Modern algorithms often use recursion. During a recursive descent, the array is usually divided into two parts; at the lowest level, the array is ordered. When returning to higher levels of recursion, the question arises of combining subarrays sorted at lower levels.

As for the inserts, in hybrid algorithms at certain stages often approximately ordered subarrays are obtained, which are best led to the final ordering with the help of inserts.

This group contains hybrid sorts, in which there is both merging and inserts, and these methods are used very differently.

Merge-insertion sort
Ford Johnson Algorithm :: Ford-Johnson algorithm

Merge + Insert


A very ancient way, already in 1959. It is described in detail in the immortal work of Donald Knuth, “The Art of Programming,” Volume 3, “Sorting and Searching,” Chapter 5, “Sorting,” Section 5.3, “Optimal Sorting,” subsection, “Sorting with a minimum number of comparisons,” and part “Sorting by Inserts and Merging” .

Sorting now has no practical value, but it is interesting to those who love the theory of algorithms. The problem of finding a way to sort with the least number of comparisons is considered. n elements. A non-trivial heuristic modification of sorting by an insert is proposed (you will not see such an insert anywhere else) using Jacobstal numbers, in order to minimize the number of comparisons. To date, it is also known that this is not the best option and you can even more deftly dodge and get even fewer comparisons. In general, the standard academic sorting is not of practical use, but for connoisseurs of the genre it is a pleasure to disassemble such tricks with an algebraic bias.

Tim Sort :: Timsort

Insert + Merge


Posted by Tim Peters 15 years ago and now

This sorting on Habré is remembered very often.
Thesis: in an array, almost ordered small subarrays are searched for which insertion sorting is used. These subarrays are then merged using merge.

The TimSort merge is the most interesting part: the classic upward merge is further optimized for different situations. For example, it is known that merging is more efficient if the connected subarrays are approximately the same in size. In TimSort, if the sizes are very different, then after additional actions there is an adjustment (we can say that part of the elements will “flow” from the larger subarray to a smaller one, after which the merge will continue in the standard mode). Various insidious situations are also provided – for example, if in one subarray all elements will be less than in another. In this case, the comparison of elements from both subarrays will be idle. The modified merge procedure will “notice” such an undesirable development of events in time, and if it is “convinced” of a pessimistic option using binary search, it will switch to a more optimal processing option.

On average, this sorting works a little slower than QuickSort, but if the incoming array contains a sufficient number of ordered subsequences of elements, then the speed increases significantly and TimSort goes ahead of the rest.

Block merge sort :: Block merge sort
Wiki-sort :: Wiki-sort
Holy Grail Sort :: Grailsort

Insert + merge + buckets


Block merge sorting animation from Wikipedia.

This is a very fresh (2008) and at the same time very promising algorithm. The fact is that the relatively significant problem of merging is the cost of additional memory. Usually, where the merging is, there is the memory complexity O (n)

But WikiSort is designed in such a way that merging occurs without the use of additional memory – it is a very rare instance among merge sorts in this regard. In addition, the algorithm is stable. Well, if the usual merge sort has the best algorithmic speed O (n log n), then in wiki sorting this indicator is O (n) Until recently, it was believed that merge sorting with such a set of characteristics was impossible in principle, but Chinese programmers surprised everyone.

The algorithm is very complicated to explain in a couple of sentences. But someday I’ll write a separate habrast about him.

Initially, the algorithm was namelessly called Block Merge Sort, however, with the light hand of Tim Peters, who studied the sorting in detail (to determine whether some of its ideas should be transferred to TimSort), the name WikiSort stuck to it.

The prematurely departed habruiser Mrrl independently worked for several years on merge sorting, which would be simultaneously fast with any incoming data, economical in memory, and stable. His creative searches were successful and he later called the developed algorithm a sorting of the Holy Grail (since it satisfies all the high requirements of “perfect sorting”). Most ideas of this algorithm are similar to those implemented in WikiSort, although these sorts are not identical and are developed independently of each other.

Hash table sort :: Hash table sort

Distribution + Insert + Merge

The array is recursively divided in half until the number of elements in the resulting subarrays reaches a certain threshold value. At the lowest level of recursion, an approximate distribution occurs (using a hash table) and the subarray is sorted by inserts. Then there is a recursive return to higher levels, the sorted halves are combined by merging.

I talked a little more about this algorithm a month ago.






Quick sort as primary

After merging and inserting, the third place in the hybrid hit chart is firmly held by everyone’s favorite quick sort.

This is a very efficient algorithm, but there are degenerate cases for it as well. Some inventors are trying to make QuickSort completely invulnerable to any bad incoming data and suggest complementing it with strong ideas from other sorts.

Introspective sort :: Introsort, Introspective sort
std :: sort

Fast + heap + inserts


Heap sorting works somewhat slower than quick sorting, but at the same time, unlike QuickSort, it does not have any degenerate cases – the average, best, and worst algorithmic time complexity is O (n log n)

Therefore, David Müsser proposed to be safe during quick sorting – if there is a too high level of nesting, then this is regarded as an attack on the system, which slipped a “bad” array. Switching to sorting by a heap takes place, which is not megabyte, but also not slow to cope with any incoming data.

C ++ has an algorithm called std :: sort, which is an implementation of introspective sorting. A small addition – if at the next level of recursion number of elements of subarray ≤ 16then insertion sorting is applied to the subarray.

Multikey sort :: Multikey sort
Bitwise Quick Sort :: Radix quick sort

Fast + ranks

Quick sorting, only the values ​​of the elements of the array are not compared with each other, but their individual bits (first, we order the higher bits in this way, move from them to the lower ones).

Or so – this is bitwise sorting by high order, ordering inside the next bit is carried out according to the quick sort algorithm.

Scatter Sort :: Spreadsort

Fast + merge + buckets + discharges

Gestalt from quicksort, merge sort, bucket sort, and bitwise sort.

In a nutshell do not explain. We will analyze this algorithm in detail in one of the following articles.






Other hybrids

Tree-counting sort

Counting + tree

The algorithm proposed by the user AlexanderUsatov. Counting sort, the number of keys counted is stored in a balanced tree.

J-sort :: J-sort

Heap + inserts

I already wrote about this sorting 5 years ago. Everything is quite simple – first in the array you need to build a non-increasing heap once, and then do exactly the opposite – once to build a non-decreasing heap. As a result of the first operation, the minimum will be in the first place of the array, and small elements as a whole will significantly move to the beginning. In the second case, the maximum will be in last place, and large elements migrate towards the end of the array. In general, we get an almost sorted array with which we do what? That’s right – sort out the inserts.






References

Merge-insertion, Block merge, Tim, Introspective, Spread, Multikey

Grail

Grail, Hash Table, Counting / Tree, J

Series Articles:

  • Excel application AlgoLab.xlsm
  • Exchange Sorts
  • Insertion Sorts
  • Sort by selection
  • Merge Sorts
  • Sort by distribution
  • Hybrid Sorting
    • Hybrid insert and merge sorts
    • QSort in hybrid sorts

Of all the sortings that are presented here, in the AlgoLab excel application for Jsort only, animation is currently implemented.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *