Experimental ternary tree sorting

I was once interested in the following question: why do all the best sorts based on comparisons have asymptotic behavior O(N log N). And why is the logarithm binary? Is it possible to create a sort whose asymptotic behavior is better in the worst case? I decided to conduct a rather interesting experiment.

Wikipedia says, that the binary nature of the logarithm is due to the fact that the result of the operation of comparing two elements can be one of two options. And if during the operation of the algorithm k comparisons, then anything is possible 2^k possible combinations of answers to them. Number of permutations from n elements is equal n! In order to be able to surjectively map the set of combinations of answers to the set of all permutations, the number of comparisons must be no less than log_2n!, since comparison is the only permitted operation. This is what the electronic encyclopedia says verbatim.

However, the comparison result accepts 3 options: greater than, less than, or equal to. Then it should be possible 3^k options for combinations of answers to them when k comparisons. Maybe the logarithm is binary because we only ever compare two elements at a time (pairwise)? Not like that either! As my experiments have shown, this condition is necessary, but not sufficient. We still need to abandon the binary tree in favor of the ternary tree!

What if you compare three elements at once? Well, imagine: we theoretically have a certain processor command that allows us to compare three elements at once and return us some code by which we can understand how these elements are mutually ordered:

cmp3 	eax, ebx, ecx, edx	; выполняет сравнение трех чисел
            				; ebx, ecx, edx с записью результата в eax

For simplicity, let’s assume that we always want to sort in ascending order. What would that do? And how, pray tell, do sorting then?

We don’t have such a team yet. But let’s create an emulation of it based on the classic comparison. Let us adapt to it some suitable sorting algorithm with complexity O(Nlog_2N). And let’s check: will the number of comparison operations really obey the dependence O(N log_3 N)? In fact, it is incorrect to write the logarithm exponent here. But I hope the picky reader will forgive me for this deliberate oversight.

To be honest, at first, I broke my brain, thinking about how to perform sorting with such an operation. While looking through suitable candidates among existing algorithms, I came across a series of first-class articles from @valemak (I recommend reading it to anyone interested in various sorts). Specifically, I paid attention to ascending heap sort. I recommend that you first read this article to refresh your memory of the algorithm. In addition, there are good animations.

However

As it turned out later, it would be much more convenient if the processor supported, for example, the following commands:

max4 	eax, ebx, ecx, esi, edi		; выполняет сравнение 4х чисел, на которые
					; указывают регистры ebx, ecx, esi, edi
					; и записывает в eax адрес максимального

max3 	eax, ebx, esi, edi		; выполняет сравнение 3х чисел, на которые
					; указывают регистры ebx, ecx, esi, edi
					; и записывает в eax адрес максимального

This is how ascending sort works in a nutshell. The array is represented as a tree. In this case, we do not need any additional memory. The index of the descendants of a node and its parent can always be calculated using simple formulas. And the root is the first element of the array. Our tree is “virtual” and, moreover, balanced “out of the box”. I declared the tree not binary, but ternary: so that each parent has three descendants.

Next, a sorting tree is built following a simple rule: a parent is always larger than his children. When we complete the construction, we will always have the largest element at the root. We place it at the end of the array by an exchange operation with the last element of the array. After which the last element is at the root. And we have to perform sifting to preserve heap invariance. And yes, after each permutation of the next root, the size of the tree decreases by one. And at the end of the real array, the maximum elements begin to be collected. We do this until there is only one element left in the heap. All.

The “screening” itself in my algorithm is carried out as follows: we compare the parent with its direct descendants at the same time. Those. at this step we compare “as if with one team” four at once! element. We select the maximum one, put it in place of the parent, and go down to the descendants of the maximum element (if there are descendants). If the parent is larger than all its descendants, we do nothing – we complete the sifting.

Yes, that’s exactly what I’m saying! It’s better to see it clearly once:

I hope you can see this animation. You can start and pause whenever it suits you.

And here is the implementation of the algorithm in Java:
public class TernaryAccentSort
{
  private static final int ROOT = 0;
  private int LAST;

  public void sort(TernaryArray array)
  {
    LAST = array.size()-1;
    heapify(array);

    while (LAST - ROOT + 1 > 4)
    {
      array.swap(ROOT, LAST--);
      sift(array, 0);
    }
    // завершающий этап, последние 4 элемента
    array.swap(0, 3);
    array.compareAndOrder(0, 1, 2);
  }



  /**
   * Строим сортируемое дерево
   *
   * @param array сортируемый массив
   */
  private void heapify(TernaryArray array)
  {
    int node = fixTail(array);

    while ( node != ROOT )
    {
      int rt = node; // node всегда указывает на чей-то правый потомок
      int md = node-1;
      int lf = node-2;
      int pr = getParent(node); // родитель этой тройки

      // получим индекс максиального элемента из четверки
      int mx = array.max(lf, md, rt, pr);

      if ( mx != pr ) // если это не родитель
      {
        array.swap(pr, mx);
        sift(array, mx);
      }

      // к следующей тройке элементов
      node -= 3;
    }
  }

  /**
   * Выполняет корректировку концевика дерева. Обычно это 1 или 2 листа некоего родителя,
   * замыкающего массив. Ставит на место родителя максимальный из листов.
   *
   * @param array массив
   * @return номер узла, с котого можно продолжать heapify
   */
  private int fixTail(TernaryArray array)
  {
    int node = LAST;

    // обрабатываем край дерева
    if ( isLeftChild(node) ) // у последнего родителя нет среднего и правого потомка
    {
      int pr = getParent(node);
      int lf = node;

      if ( array.compare(pr, lf) < 0 )
        array.swap(pr, lf); // родитель должен быть всегда больше потомков

      node--;
    }
    else if ( isMiddleChild(node) ) // у последнего родителя нет только правого потомка
    {
      int pr = getParent(node);
      int lf = node-1;
      int md = node;

      // ставим на место родителя максимальный элемент из тройки: родитель, левый потомок и средний потомок
      array.compareAndOrder(lf, md, pr);

      node -= 2;
    }
    // на выходе node всегда будет показывать на правого потомка предпоследнего родителя
    return node;
  }


  /**
   * Выполняет просейку сортирующего поддерева дерева для сохранения инварианта:
   * родитель всегда больше своих потомков. Вызывается в случае нарушения родителем node
   * этого условия
   *
   * @param array сортируемое дерево
   * @param node элемент, нарушающий инвариант
   */
  private void sift(TernaryArray array, int node)
  {
    // сравниваем корень и два его потомка; помещаем корень куда следует
    int lf, rt, md, mx, pr = node;

    while ( hasLeftChild(pr) )
    {
      lf = getLeftChild(pr);
      md = getMiddleChild(pr);
      rt = getRightChild(pr);

      // определим, кто из этих элементов максимальный?
      if ( hasRightChild(pr) )
        mx = array.max(lf, md, rt, pr);

      else if ( hasMiddleChild(pr) )
        mx = array.max(lf, md, pr);

      else
        mx  = array.max(lf, pr);

      if ( mx != pr ) // если это не родитель
      {
        array.swap(pr, mx);
        pr = mx;
      }
      else
        break;
    }
  }





  private int getLeftChild(int node)
  {
    return node*3 + 1;
  }

  private int getMiddleChild(int node)
  {
    return node*3 + 2;
  }

  private int getRightChild(int node)
  {
    return node*3 + 3;
  }

  private int getParent(int child)
  {
    return (child-1) / 3;
  }

  private boolean isLeftChild(int node)
  {
    return (node % 3) == 1;
  }

  private boolean isMiddleChild(int node)
  {
    return (node % 3) == 2;
  }

//  private boolean isRightChild(int node)
//  {
//    return (node % 3) == 0;
//  }

  private boolean hasLeftChild(int node)
  {
    return getLeftChild(node) <= LAST;
  }

  private boolean hasMiddleChild(int node)
  {
    return getMiddleChild(node) <= LAST;
  }

  private boolean hasRightChild(int node)
  {
    return getRightChild(node) <= LAST;
  }
}

Is it really simple? I built a comparison counter into the array class and therefore (without dancing with the tambourine of time measurement) I can accurately estimate how many comparisons and permutations are actually carried out. And the operating time depends on this.

Complete project code with measurements and comparisons.

Next, I compared the performance of my algorithm with the original ascending sort algorithm. Here is a graphical representation of the measurement results:

As you can see in the graphs, we really bring the sorting complexity closer to the ternary logarithm (see parameter: “New comparisons”). Which was expected. I just wanted to see this in practice. Note that both algorithms perform slightly fewer exchanges and comparisons than is expected from such algorithms. Because this is the fastest algorithm of all, for completely unbalanced arrays. And by the way, this implementation of the algorithm makes fewer exchanges than the original algorithm. Although the comparison operation is much more expensive (well, if we are not comparing some natural numbers again).

If the required command on the processor were implemented in hardware, then the ternary tree would greatly outperform the binary tree on large arrays. Although. If you are an engineer from, say, Oracle and are making your own processors (Sun Microsystems used to make them for them) designed to work with a DBMS, then implementing a couple of new commands would make sense for you. Also, perhaps, the ternary tree will be useful to developers of specialized FGPA / ASIC boards, if sorting is not the last among the planned tasks.

What do you think, is it worth creating similar commands in classical general-use processors and what interface should they have?

It is clear that deep at the hardware level, a compound comparison command will still consist of binary pairwise comparisons. And such a trick will not reduce the actual number of comparisons performed, or even increase it (my experiments showed this; although it all depends on the specific algorithm). But here the “parallelism” of such pairwise comparisons can come into play. In one conventional processor cycle, we will be able to perform an operation that we would have spent more on in a classical execution.

In the next article, I will offer readers a version of a hybrid sorting algorithm that approaches O(N log N) reduction from a slightly different angle. No, I didn’t invent a new algorithm. I will simply suggest the optimal combination of those available, and also point out a funny phenomenon that I jokingly call the “mass defect”.

mutually ordered:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *