A heap is a special kind of tree that's often used for sorting, to the point that it has its own name: the heapsort (instant demonstration opportunity). Even though as a description it's completely useless, "heap" is still a pretty great name for a data structure. A heap is also neat in that it can be implemented just using an array, instead of building a graph of interrelated node objects. After juggling references in our first two data structures, this should be a fun change of pace.
Note: if it wasn't obvious already, the fact that I've written "this should be a fun change of pace" about using a JavaScript array is a huge nerd alert for this blog.
So how does a heap work? Like any balanced tree, a heap is governed by rules, but in this case they're extremely simple. In fact, there's only one rule: each parent node's key must be greater than or equal to the keys of its child nodes (piece of cake). Because the rule only concerns children (regardless of which side they're on) and their immediate parent, our balancing operations are simplified to swaps between a child and a parent. Unlike AA trees or Red-Black trees, there aren't any rotations or branch shifts here.
These swaps are made easier by the fact that heaps are stored using plain arrays, with the position of each child determined via the array index. Basically, the first item in the array is the head node (with the greatest value), with its children stored in the next two slots, and their children in the next four, and so on. So for any given parent node at index n, its children are located at indexes 2n + 1 and 2n + 2. And for any child node c, its parent is located at (c - 1) / 2 rounded down to the nearest integer. You can simplify the math slightly if we start our array at 1 instead of 0, which the example below does. Here's the code for finding the children or parent of a given heap index in a 1-indexed array.
var leftOf = function(index) {
return (index * 2);
};
var rightOf = function(index) {
return (index * 2) + 1;
};
var parentOf = function(index) {
return Math.floor(index / 2);
};
How elegant is that--creating a tree out of a flat list and a little math? Just add some bounds checks to keep things safe (or, in JavaScript and other managed languages with dynamic arrays, check for undefined) and our data structure is literally complete. Now we just need to be able to insert and delete. To insert, we just push a value onto the end of the array (automatically making it the rightmost child on the bottom level) and then recursively "sift up" to balance.
Heap.prototype.insert = function(value) {
//push() returns the new length of the array
var index = this.push(value) - 1;
this.siftUp(index);
};
Heap.prototype.siftUp = function(index) {
//don't go past the root node
if (index <= 1) return;
var childKey = this[index];
var parentIndex = this.parentOf(index);
var parentKey = this[parentIndex];
//if the value is greater than its parent, swap
if (childKey > parentKey) {
var swap = this[parentIndex];
this[parentIndex] = this[index];
this[index] = swap;
}
this.siftUp(parentIndex); //recurse upward
return this;
};
Note: in this post, I'm just going to include real code from my implementation, since this has taken long enough to get out already and I don't feel like rewriting all the samples. It's still pretty easy to read. Just remember that in all these code blocks, this refers to the Heap object that we're traversing.
Inserting and siftUp() are pretty easy. Removal is more difficult, because it rebalances using a "sift down," and that process is a little weird. Sifting downward means that we find the parent of the last node (which, conveniently, is exactly in the middle of our array), and then (this is the weird part) we work our way backwards up the array linearly, while simultaneously checking for rule violations and recursing downward along the tree when they're found.
Like I said, it's strange. It may be easier to simply read the code.
Heap.prototype.remove = function(key) {
var index = 1;
//find the item to be deleted
for (; index < this.length; index++) {
var needle = this[index];
if (key == needle) break;
}
//swap it with the end-most child, just like an AA tree, then delete
this[index] = this[this.length - 1];
delete this[this.length - 1];
this.siftDown(index);
};
//The "termination" argument sets an internal endpoint for the sift.
//That will be important when we start doing heapsort.
Heap.prototype.siftDown = function(termination) {
//if no termination is provided, the heap spans the whole array
if (!termination || termination > this.length) termination = this.length - 1;
//start at the last non-leaf node, in the middle of the array
for (root = Math.floor(this.length / 2); root > 0; root--) {
var index = root;
while (index < termination) {
var children = [this.rightOf(index), this.leftOf(index)];
var swapIndex = index, swapKey = null;
//Find the bigger of the two children, since
//there's no point in swapping with the smaller child
for (var i = 0; i < 2; i++) {
var childIndex = children[i];
var childKey = this[childIndex];
if (!childKey || childIndex > termination) continue;
var swapKey = this[swapIndex];
if (childKey > swapKey) {
swapIndex = childIndex;
}
}
//if the child is larger, swap with the parent and then
//continue downward
if (swapIndex != index) {
var swap = this[index];
this[index] = this[swapIndex];
this[swapIndex] = swap;
index = swapIndex;
} else {
break;
}
}
}
return this;
};
I'll be honest, siftDown() was the most complicated part of this. This implementation is apparently more efficient than sifting upward, but it's much more difficult to understand, and it took me a long time to get right. But with a functioning downward sift, we can finally recreate the famed heapsort:
var heapSort = function(heap) {
//support using heapsort on Arrays as well
if (!(heap instanceof Heap)) {
var values = heap;
heap = new Heap();
heap.push.apply(heap, values);
}
//start with the heap taking up the whole array
var endAt = heap.length - 1;
while (endAt > 0) {
//construct the heap
heap.siftDown(endAt);
//move the largest value to the end
var swap = heap[endAt];
heap[endAt] = heap[1];
heap[1] = swap;
//shrink the heap and repeat
endAt--;
}
return heap.toArray();
};
It's surprisingly simple, right? Basically, we split the array into two parts--the working heap in front and the sorted values in back. We also keep track of how "long" the heap portion of the array is. Then we sort the heap, swap its last value with its first value (which moves the new largest value to the front of the sorted list), shrink the "heap length" value by one, and repeat until we reach the start of the array.
Besides getting the downward sift working properly, the next hardest part was re-learning to work with arrays as flat chunks of memory. I implemented my Heap objects as descendents of the native JavaScript Array object, because that seemed like a natural thing to do, but it means you can't call any of the methods the Heap inherits from Array: they'll return an Array instead of a Heap, and future calls to Heap instance methods will fail. So in my first heapsort, which used Array.slice() and Array.concat() to break apart and recombine both portions of the array, it kept breaking on the second pass when my heap stopped being a Heap. Not to mention it was more expensive, since I was constantly constructing new Array objects during each loop. By working strictly with array indexes, as if I were using a pointer, I didn't need to construct anything except for the temporary swap variable.
The cool thing about learning how heaps and heapsorts work is that it made me think about new ways to solve other problems, like (to pick an example) shuffling a deck of cards. Instead of a solution where I would keep track of which cards have been picked and then skip them as the shuffle proceeds, a heap-ish shuffle uses the back end of the deck as storage, and just keeps pulling random cards out of the shrinking front portion. It's much faster and less likely to blow up if your random number generator is not very random and/or lucky.
var deck = [];
for (var i = 1; i <= 52; i++) deck.push(i);
//deck = [1, 2, 3, 4 ...]
var shuffle = function(arr) {
var output = [], offset = arr.length - 1;
for (; offset >= 0; offset--) {
var index = Math.floor(Math.random() * offset);
output[offset] = arr[index];
var swap = arr[offset];
arr[offset] = arr[index];
arr[index] = swap;
}
return output;
}
shuffle(deck).sort();
//returns a randomized array
In the input box below, I've loaded the text of this post as a series of comma-separated values. When you press the "Run" button, those values will be pulled into an array and run through a heapsort, with the results displayed below. You can edit the input box between runs if you want to provide your own data (remember that spaces are significant). Note that although I think this heapsort should be fast for an all-JavaScript sort, it is slower than the browser's built-in Array.sort() method, probably because the latter is implemented in native code.
As always, feel free to read the commented Heap library code and the source for the example.