Invert Sort

A programmer from India clearly shows Zig-Zag, Zig-Zig and Zig used in the SplaySort algorithm:

This season we are exploring a variety of heaps and how they can be used for sorting. However, this time we will take a step away from the main theme. Today’s structure – splay tree – is not a bunch. But we need it to mentally prepare for the study of the next pile – next week there will be a lecture on sorting by a Cartesian tree.

Binary search tree sort

Splay tree is an improved binary search tree. First, let’s remember how to sort using the usual “unimproved” binary tree.

As you well know, in a binary tree, any left child is less than the parent, any right child is no less (i.e. greater or equal) than the parent.

Sorting is generally straightforward:

  • Stage 1. Based on the array, build a binary search tree. We put the first element of the array to the root, first compare the remaining elements with the root, then, depending on the comparison, move down the left or right branches (and along the way we compare the array element with the existing tree nodes). In the end, the next element reaches the end of a branch and becomes a node itself.
  • Stage 2. We go around the constructed binary search tree from minimum to maximum. The fact is that the strict organization of this tree (the left child is less than the parent, and the right one is greater than or equal than the parent) allows you to bypass this tree from smaller to larger elements in a simple recursive way. Which in turn produces tree nodes in ascending order.

The construction of the tree is quite tolerable in terms of algorithmic complexity – on average, inserting a new node costs O (log n)so the time complexity of the first stage is O (n log n).

But in the second stage, not everything is so rosy. A recursive tree walk can easily become a long journey through an extremely winding maze, and the time complexity often degrades to O (n2).

Splay tree

To solve this problem, only some 35-37 years ago, two Scientist scientists Robert Tarjan and Daniel Slitor developed this tree structure. They suggested that for any operations (insert, search, delete) with any tree node, immediately rebalance the tree, making the node the root of the entire structure.


In the left photo is Robert Tarjan (first row, second right) in the company of the Matrix Architect and inventor of Pascal. In the right photo is Daniel Slitor.
Clicking on the image will open a full-format version.

In Russian, the unsuccessful name “expanding tree” stuck, less often – “oblique tree”. Although if it were simply literally translated, then the “expanded tree” (or, even better, “turned”) sounds good and more accurately reflects the essence of this algorithmic structure. But that is.

In order to push the node to the root, special simple operations are used, the so-called turns:

Zig Turn

If the element you want to make a root is on the second level of nesting, then everything is extremely simple.

Denote this element as X, and its parent (also the root of the tree) – as P.
A, B, C – these are subtrees. How many nodes are there in these subtrees, only the roots of these subtrees, which have a parent-child relationship with X and P. If any of the subtrees are missing (i.e., if X and P there are no descendants), this does not affect the order of actions.

To make X the root of the tree, you need to change the relationship “parent-child” between X and Pand also outweigh the subtree B – it was a right descendant Xbecame a left descendant P (with A and C nothing to do at all). And it’s all! As a result of these simple manipulations, the structure, as it was a binary search tree, remained so – the principle “the left child is less than the parent, the right child is greater or equal than the parent” will not be violated.

In the previous picture X was a left descendant for P. If X was a right descendant, you just need to mirror the situation:

If X has not a second level of difficulty, but a third, it’s all more difficult, but not by much.

If X has a parent Pwhich in turn has a parent G, then we have only two different situations:
1) if X is an on the same side a descendant for Pthat and P for G;
2) if X and P are versatile descendants for their parents.

First, consider the first case.

ZigZig Turn

Let be X Is the left child for P, and P – left child for G (option when X and P are simultaneously right descendants is solved similarly).

As you can see, the relationship between X, P and Gand also correctly outweigh the subtrees B and C. And our binary search tree will not suffer from this.

ZigZag Turn

If X right descendant, and P left (or X left and P right, not the essence), then I hope you already understand everything from this picture:

If X in the tree is still at a lower level of nesting, then to raise it you need to apply ZigZig or ZigZag (if necessary, do it several times) to that parent up the branch, which is at the third level of nesting.

Invert Sort :: Splay sort


Actually, here we perform the same points as in tree sorting – first we build a binary sort tree, then we go around it. But every time we insert a new knot into the tree – using ridges and zags, we make it the root.

At the first stage of winning, this does not give, the insertion of one node (taking into account the fact that you have to zigzag to make it a root) costs on average O (log n) – and thus the complexity of this stage O (n log n).

However, as a result, amazing transformations take place with the tree – it straightens and is kept in such a straightened state all the time. And this decisively accelerates the second stage (tree traversal), since the resulting final ladder is processed with complexity O (n)

Thus, the overall complexity of the algorithm (worst, medium, best) in time is obtained O (n log n).

In practice, this sorting is slower than MergeSort, QuickSort and other HeapSort, but it clearly demonstrates how you can speed up operations with binary search tree.

The moral of this fable is this: if you have to deal with a binary search tree, work with it as with a splay tree, i.e. in any action with any node in the tree, make this node the root.

C code

not to look nervous

/*
 *------------------------------------------------------------
 *
 *      File..........: $RCSfile: splaysort.c,v $
 *      Revision......: $Revision: 1.2 $
 *      Author........: Gary Eddy (gary@cs.mu.OZ.AU)
 *			Alistair Moffat (alistair@cs.mu.OZ.AU)
 *      Date..........: $Date: 1995/09/07 06:19:17 $
 *
 *	Description:
 *		Sorts by repeated insertion into a splay tree.
 *		Insertions are done top down
 *
 *------------------------------------------------------------
 */

#include	
#include	
#include	
#include	

char	*sort_name = "Splaysort";
char	*sort_version = "$Revision: 1.2 $";

/*
** Define DATAPTR for the 12 byte per record version of the program.
** Otherwise only 8 extra bytes per record are used and data
** references are done by indexing the data array.
** Different compiler/architecture combinations can cause wild
** variation in the ratio of speeds between these variations.
*/

#define DATAPTR

#ifdef DATAPTR
#define DATA(p) ((p)->data)
#else
#define DATA(p) (A+size*(p-T))
#endif

/*
** With the fast copy option enabled a more intelligent copy is used
** depending on the size of the items being copied.
** This approach adopted from the nqsort program of Bentley and McIlroy,
** see Software Practice and Experience, v23, n11, November 1993, 1249-65.
*/

#define FASTCOPY

#ifdef FASTCOPY
#define COPYCODE(TYPE, parmi, parmj, n) { 
        long i = (n) / sizeof (TYPE); 
        register TYPE *pi = (TYPE *) (parmi); 
        register TYPE *pj = (TYPE *) (parmj); 
        do { 
                *pi++ = *pj++;                    
        } while (--i > 0);      
}

void
copyfunc(char *d, char *s, int size, int copytype)
{
        if(copytype <= 1)
                COPYCODE(long, d, s, size)
        else
                COPYCODE(char, d, s, size)
	return;
}

#define COPY(d,s,size) 
        if (copytype == 0) { 
                *(long *)(d) = *(long *)(s); 
        } else 
                copyfunc(d, s, size, copytype)
#else
#define COPY(d,s,size)	memcpy(d,s,size)
#endif

typedef struct  node_rec node;

struct	node_rec {
	node	*left, *rght;
#ifdef DATAPTR
	char	*data;
#endif
};

/*
 *	sort():
 *		The entire sort code 
 *
 *	Accesses outside variables:	none
 *
 *	Return value...:		none
 */

void
sort(void *A, size_t n, size_t size, int (*cmp)(const void *, const void *))
{
	register node *next, *curr, *prnt;
	char 	*item;
	node	*l, *r, *ch;
	node	root, *T, *new, *end, *move;
#ifndef DATAPTR
	char	*p;
#endif

/*
** Determine which copying method will be used.
*/
#ifdef FASTCOPY
	int	copytype=((char *)A - (char *)0) % sizeof(long) ||
		size % sizeof(long) ? 2 : size == sizeof(long) ? 0 : 1;
#endif


	if(n < 2)
		return;
	if((T = new = (node *) malloc(sizeof(*T)*n)) == NULL) {
		fprintf(stderr, "Couldn't allocate space for structuren");
	}
	/* 
	** Do the first insertion manually to avoid the empty tree
	** case in the main loop.
	*/
	curr = new++;
	item = A;
	curr->left = curr->rght = NULL;
#ifdef DATAPTR
	curr->data = item;
#endif
	item += size;
	end = T+n;
	/*
	** For each item move down the tree dividing it into
	** two subtrees, one containing items less than the new
	** element and the other those which are greater.
	** The pointers l and r refer to the greatest element in the
	** smaller subtree and the smallest element in the large
	** subtree respectively. During the splitting of the tree
	** l and r retain children that may be incorrect in the final
	** tree but these false links are cut at the end of the
	** insertion.
	*/

	for( ; newleft) == NULL) {
					r = r->left = curr;
					break;
				}
				/* Left zig-zig */
				else if(cmp(item, DATA(ch)) < 0) {
					curr->left = ch->rght;
					ch->rght = curr;
					r = r->left = ch;
					curr = ch->left;
				}
				/* Left zig-zag */
				else {
					r = r->left = curr;
					l = l->rght = ch;
					curr = ch->rght;
				}
			}
			else {
				/* Right root case */
				if((ch = curr->rght) == NULL) {
					l = l->rght = curr;
					break;
				}
				/* Right zig-zag */
				else if (cmp(item, DATA(ch)) < 0) {
					l = l->rght = curr;
					r = r->left = ch;
					curr = ch->left;
				}
				/* Right zig-zig */
				else {
					curr->rght = ch->left;
					ch->left = curr;
					l = l->rght = ch;
					curr = ch->rght;
				}
			}
		}
		/* Cut false links */
		r->left = l->rght = NULL;
		new->rght = root.left;
		new->left = root.rght;
#ifdef DATAPTR
		new->data = item;
#endif
		curr = new++;
		item += size;
	}
	/* Now copy all of the data back into the input array.
	** Uses an iterative destructive inorder traversal.
	** Last item inserted is the current root.
	*/
	prnt = NULL;
	move = T;
	while (1) {
		if ((next = curr->left) != NULL) {
			/* left subtree exists */
			curr->left = prnt;
			prnt = curr;
			curr = next;
		}
		else {
			next = curr->rght;
			curr->rght = move++;
			if (next != NULL) {
				/* and arrange for a visit */
				if((curr = next->left) != NULL) {
					next->left = prnt;
					prnt = next;
					continue;
				}
				else {
					curr = next;
					continue;
				}
			}
			/* no right subtree either, turn around*/
			if (prnt != NULL) {
				curr = prnt;
				prnt = prnt->left;
				curr->left = NULL;
				continue;
			}
			/* nothing left to be done */
			break;
		}
	}
#ifndef DATAPTR
	/*
	** Change the goes-to array in rght to a comes_from in left.
	** Note the kludge on pointers, where left points into the 
	** character array containing the elements.
	*/
	for(next = T, p = A; next < end; p += size, next++)
		next->rght->left = (node *)p;
	/* and use the `comes from' array to unscramble the permutation */
	item = (char *)malloc(size);
        for (next=T; nextleft == NULL)
                        continue;
                curr = next;
                final = datacurr = DATA(curr);
                COPY(item, datacurr, size);
                while ((char *)(curr->left) != final) {
                        dataleftcurr = (char *)(curr->left);
                        COPY(datacurr, dataleftcurr, size);
                        prnt = curr;
                        curr = T + (((char *)(curr->left)-A)/size);
                        prnt->left = NULL;
                        datacurr = dataleftcurr;
                }
                COPY(datacurr, item, size);
                curr->left = NULL;
        }
#else
	/* Change the goes-to array in rght to a comes_from in left */
	for(next = T; next < end; next++)
		next->rght->left = next;
	/* and use the `comes from' array to unscramble the permutation */
	/*
	** This 12 byte version uses the presence of the data pointer
	** by making it the flag for already placed items. This means
	** that left pointers can point to nodes, eliminating the
	** calculation to find the next node.
	*/
	item = (char *)malloc(size);
	for (next=T; nextleft == next)
			continue;
		final = datacurr = DATA(next);
		curr = next->left;
		COPY(item, datacurr, size);
		while ((dataleftcurr = DATA(curr)) != final) {
			COPY(datacurr, dataleftcurr, size);
			prnt = curr;
			curr = curr->left;
			DATA(prnt) = (char *) NULL;
			datacurr = dataleftcurr;
		}
		COPY(datacurr, item, size);
		DATA(curr) = (char *) NULL;
	}
#endif
	free(item);
	free(T);
} /* sort() */

Next Series Trailer

And we return to the heaps. The next little thing will be more interesting than the one that we examined today. This hybrid is a tree-like data structure that is both a heap and a binary search tree.

References

Binary search tree, Splay tree

Splay trees

Series Articles:

  • Excel application AlgoLab.xlsm
  • Exchange Sorts
  • Insertion Sorts
    • Librarian Sort
    • Solitaire Sort
    • Sort “Tower of Hanoi”
    • Invert Sort
    • Young table sorting
  • Sort by selection
  • Merge Sorts
  • Sort by distribution
  • Hybrid Sorting

Sort added to AlgoLab. So, if you update the Excel file with macros, you can personally upset the binary search tree.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *