Heaps for Priority Queues and O(N Log2 N) Sorting


We will next study heaps (this lecture) and AVL trees (the next lecture). Both
of these are special kinds of binary trees that have interesting order/structure
properties that are different from binary search trees. They both use similar
rules when adding/removing values from these trees, and then restoring their
properties.

Heaps (this term has another important use in Computer Science: memory used
for allocating the storage for objects when the "new" operator is used) are
almost the perfect data structure for storing priority queues. By using heaps
we can add a value in O(Log2 N) and also remove the highest priority value in
O(Log2 N). The reason for these complexity classes: heaps always stay perfectly
balanced, unlike BSTs whose height can vary from N to Log2 N. In the discussion
below I will characterize Min-Heaps and the algorithms that process them. We
could use Max-Heaps, which are appropriate for storing priority queues: their
operations follow almost identical (but flipped) rules.

------------------------------------------------------------------------------

Order and Structure Properties:

Recall that BSTs had a special order property (FOR EVERY NODE, only smaller
values are in its left subtree and only larger values are in its right subtree)
and no structure property (BSTs can take on any shape, from well-balanced to
pathologically unbalanced, depending on the order the values are added into the
BST).

Min-Heaps have both a special order and structure property:

Order:     Every node must be less than or equal to all nodes in its left and
           right subtrees. This is the same as saying something simpler: every
           node must be less than or equal to its children (because each child
           will in turn be then less that or equal to its two children, so each
           node is less than or equal to its grandchildren, ...). We can also
           say the same thing as "every node must be greater than or equal to
           its parent."

           Note: unlike BSTs, it is reasonable to allow heaps to store multiple 
           copies of the same value. So that is why "equal" is allowed in the
           order definition above.

Structure: All depths must be filled, except possibly the deepest. If the
           deepest is not filled, then all the nodes at that depth must appear
           as far to the left as possible. This property ensures that every
           heap with N values has the SAME STRUCTURE as every other heap with N
           values: only the values inside the nodes are different. This is a
           bit more like the structure of linked lists than BSTs: every N
           element linked list has the same structure, only its values are
           different.

Note that like BSTs, there can be multiple legal heaps with the same values:
but, they all have the same structure (by the structure property); it is only
the values stored in the nodes that might be different (except the minimum node
must always be at the root). For example, the following 3 trees all represent
min-heaps with 4 different values.

       1         1         1  
      / \       / \       / \ 
     2   3     3   2     2   4
    /         /         /     
   4         4         3      

Which values are in which nodes will depend on the order in which the values
are added to the heap, just as the structure property of a BST depends on the
order in which values are added to the BST. But, there are more constraints
given the structure property of heaps, so not every binary tree structure can
represent a heap: no pathological ones, and not even "right-heavy" trees.

The pictures accompany this lecture illustrate Min-heaps, including how values
are added and how the minimum value is removed (two fast operations; for
Max-Heaps, related to priority queues, removing the maximum value is fast). We
will examine the algorithms to add values and remove-the-minimum value from a
heap (and how to restore the heap properties for both). Then we will change our
perspective and learn how to store heaps compactly in arrays (doable because of
the structure property). Finally we will learn an "offline" algorithm to build
a heap with N values in O(N) time instead of O(N Log2 N): this bigger bound is
achieve with an "online" algorithm that simply adds N values to the heap, each
taking at worst O(Log2 N) time.

------------------------------------------------------------------------------

Insertion into a Heap:

To add a value into a min-heap, first we must place it in a new node in the tree
according to the heap STRUCTURE property, so we put it directly to the right of
the rightmost node at the deepest depth (or if that depth is filled, as the 
leftmost child of the deepest leftmost node: at one greater depth). By adding
this node in this way, we now have a tree (with one more node) that satisifes
the heap STRUCTURE property, but not necessarily the ORDER property - yet. 

The added value may violate the order property: it may be smaller than its
parent: in fact it may be very small, and belong near the top of the heap. To
restore the order property, we compare that node's value with its parent's
value: we stop if these values are in the correct order (for a min-heap, if the
parent's value <= child's value); if they are in the wrong order (parent's value
> child's value) we swap the values in these nodes; note that putting a smaller
value in the parent will not affect the order property in the other subtree of
the parent: the subtree values were all >= the original parent, and the old
parent's value has now been replaced with a yet smaller new value, so all the
values in the other subtree are all still >= than their parent's new value.
Note that swapping does not affect the STRUCTURE of the heap at all, so the
STRUCTURE is unaffected, meaning the structure property is still satisfied.

Then we compare the parent (which now has a new value) with its parent according
to the same rules and again swap if necessary. We continue comparing and
swapping until (a) the value settles into its correct place: we find a parent's
value <= than the child's value, or (b) the added value is now at the root of
the tree, where there is no parent's value to compare/swap with. The tree will
then satisfy the heap ORDER property (and because swapping does not change its
structure, it also satisfies the STRUCTURE property). So both properties are
restored after the new value is added.

We call this the "percolate up" operation, because the value added at the bottom
of the tree percolates up to its correct position (according to the ORDER
property), while not changing the STRUCTURE property.

Because of the heap structure property, heaps are perfectly balanced trees, so
the height of a tree is always O(Log2 N) and therefore the add operation is
O(Log2 N): comparing/swapping the leaf all the way to the root in the worst
case requires O(Log2 N) comparisons and swaps (each comparison and swap
operation take O(1) time).

------------------------------------------------------------------------------

Deletion (and Return) of the Mininum Value in a Heap:

We can also remove the smallest value in a Min-Heap efficiently, returning it
as a value from the remove operation (e.g., dequeue from a priority queue in a
Max-Heap). To do so, we save the value from the root (it is guaranteed to be the
smallest in the tree by the order property), which will be returned by the
"remove" method when it finishes. Then we take the farthest right node at the
deepest depth and remove that node from the structure of the tree, but first
place its value into the root node (whose value we have already saved, for
returning from this method). By removing the node at the bottom-right of the
tree, we now have a tree with one fewer node, and a tree that still satisifes
the heap STRUCTURE property.

But, it is very likely that the value now at the top doesn't belong there; we
took it from the bottom, which tends to have the biggest values, yet the
smallest value belongs at the root of a heap. To restore the order property, if
the root is > than either/both of its children, we swap its value with the
value of its SMALLEST child. Again, if the left/right child is the smaller
value, then as a parent it will still be <= all the nodes in the right/left
subtrees.

Note that in doing this comparing/swapping, we might just compare the parent
with its left child (if it has only one child), or maybe compare it to both
children (if it has two children). Because of the STRUCTURE property of heaps,
a node will NEVER have JUST A RIGHT CHILD.

We continue the process, swapping the value downward, until it settles into its
correct place: we find it is <= its children, or it is in a leaf and there are
no children's value to compare it to. Again, at most we do O(Log2 N) operations
(although here each operations might be two comparisons and a swap), one
operation per depth in the tree (each comparison and swap operation takes
O(1) time). 

We call this the "peroclate down" operation. Actually, because we are comparing
its value against up to two children (for percolate up we always compare it to
just its one parent), we might have to do two times the work when comparing
values, so it might take longer to percolate a value down than it does to
percolate a value up. But factor of two (a constant) disappears from our
complexity class analysis, so we can ignore the difference and just count swaps,
not comparisons, in which both do at most O(Log2 N).

------------------------------------------------------------------------------

Compactly Storing Heaps in Arrays:

Finally, we will learn that we can store a heap compactly in an array, with
each node storing no explicit pointers to its parent, or its left or right
children. We start by by storing the root at position 0. If a parent is stored
at index i, then its left child is stored at index 2i+1 and its right child is
stored at index 2i+2. For any child is stored at index i, its parent is stored
at index (i-1)/2 (remember integer division truncates: note that (2*i+1-1)/2
and (2*i+2-1)/2 both result in the value i). Storing a heap of N values
requires an array of length N.

The picture below shows at which index (the number shown) each value in a heap
of size 15 (0-14) is stored.

                                     0
                  /                                    \
                 1                                      2
            /          \                          /          \
           3            4                        5            6
         /   \        /   \                    /   \        /   \
        7     8      9    10                  11  12       13   14

Also look at the pictures accompanying this lecture to see how to store a heap
in an array. Notice that the order of values in the array is a equivalent to a
breadth-first traversal of the tree. There will be no "holes" in the array: its
values are  contiguous (all the values from indexes 0 to N-1 will be filled with
data) because of the heap STRUCTURE property.

Verify the numeric computations of parent/children for a few nodes. It is useful
to hide these computations in functions: writing the functions int left(int i),
int right(int i), and int parent(int i) which each take the INDEX of any heap
node and computes the INDEX of it left child, its right-child, or its parent
respectively. A bool in_heap(int i) function would compute whether or not index
i is in the heap: for percolate down we might ask whether in_heap(left(n)) or
in_heap(right(n)) to determine whether n has a left or right child respectively.

Note that if we have a N node heap it occupies an array of length N (in indexes
0 to N-1). If we need to add another node, it must go at index N; if instead we
need to remove a node, the node at index N-1 is removed and its value put at
index 0 (which is then percolated down). So adding and removing values always
occurs at the end of the array (whose size might need to be doubled, to add a
another value with array implementations of data types).

Max-heaps are good for implementing priority queues, where we must be able to
enqueue any value but dequeue only the highest priority value. As we have seen,
we can use priority queues to sort a list of N values by (1) adding each to the
pq/heap in O(Log2 N), which is O(N Log2 N) and then (2) removing N values from
the pq/heap (they come out biggest to smallest) each in O(Log2 N), which is
also O(N Log2 N). So doing N adds and N removes is O(N Log2 N) and we have
discovered an O(N Log2 N) sorting algorithm, in a lower complexity class than
any of the O(N^2) sorts, but the O(N^2) algorithms are still easier to write,
with simple nested "for" loops.

I expect you to be able to draw pictures of heaps (both as trees and arrays)
and update them according to the algorithms discussed above; you will also
write the code for Max-Heaps in C++ in Programming Assignment #3, where you
will use this data structure to implement lower-complexity class priority queue.

Why not store all binary trees in arrays using this mapping? Well, some BSTs
have a very pathological structure, making their storage in arrays very
inefficient. We can store heaps with N values in an array with N values: the
values are contiguous with no holes. But a binary tree with N values can
require an array with 2^N-1 values. For example, the folowing pathological tree
requires an array with 17 values: 1 is stored in index 0, 2 is stored in index
2, 3 is stored in index 6, 4 is stored in index 14, and 5 is stored in index 30.
The array needs 31 indexes (0-30) but contains only 5 values.

   1
    \
     2
      \
       3
        \
         4
          \
           5

So, this method of storage is poor, unless the binary trees are close to
perfectly balanced. Heaps are always perfectly balanced because of their
STRUCTURE property, so this method is perfect for storing heaps.


------------------------------------------------------------------------------

Building Heaps Offline: Some Interesting Mathematics Related to Algorithms

"Online" algorithms receive their inputs one at a time and have to completely
update their data structure before the next value is received and processed.
Building a heap by adding one value at a time is an example of an online
algorithm. We start with an emtpy heap, and after we add each value we get
a new heap with one more value in it, until we have the final heap with all
the required values.

"Offline" algorithms receive ALL THEIR INPUTS before they are required to 
process any of them. In an offline heap algorithm, we can use our knowledge of
all its values before even starting to make a heap from them.

We can write a suprisingly simple ane efficient (O(N) algorithm) to build a
heap of N values if we can get all the values that will be added to a heap
BEFORE we start building the heap. This is another interesting result of the
heap order/structure properties.

We previously saw an offline algorithm for building a balanced BST from a
(sorted) array of values in O(N). Here we will examine how to build a heap in
O(N), if we know all the values that must be added to the heap before we start.

We will first use "h", the height of a heap with as many nodes as possible, as
a metric, not N, the size (number of nodes) in the heap. We will count the
number of comparisons needed to build heaps of different heights using an
offline algorithm.

Let's start with the smallest height, h=0. A heap of height 0 has only one
node in it, so it takes no comparisons to build (that node is the root).

We can build a heap of height h=1 (3 nodes) by putting a ("a" stands for any
value) on top of two h=0 heaps ("b" and "c", standing for any values) and then
do one pair of comparisons (b compared to c and a compared to the smaller) and
at most one swap a with its smallest child (if a is not already less than both
its children). Then we have a heap storing these three values and requires at
most 2 comparisons and a swap, which is just a constant number of operations.
Such a tree can look like

         a             b             b             c             c
       /   \         /   \         /   \         /   \         /   \  
      b     c       a     c       c     a       a     b       b     a 

In fact if we have two heaps of height h, then we can efficiently build a new
heap of height h+1 by putting the new value x on top of these two heaps

         x
       /   \
      / h+1 \
     +       +
    / \     / \
   / h \   / h \
  +-----+ +-----+

and then percolating the value x down into the heap where it belongs.

Let Ph be a heap that is a perfect tree of height h; perfect means every depth
is filled. Height(Ph) is h, and Size(Ph) is the number of nodes in this heap
that is a perfect tree, which we've computed as 2^(h+1) - 1. Let C(Ph) be the
number of pairs of comparisons needed to build such a heap with the algorithm
outlined above (it is easier to count pairs, and multiply by two: of course
multiplying by two doesn't change the complexity class, so we will never even
bother with this "correction").

According to the algorithm, we will build a heap Ph by first building two heaps
Ph-1 and then putting one value on the top and percolating it down at most
h times. Recursively, we will will build each heap Ph-1 by first building two
heaps Ph-2 and then putting one value on the top and percolating it down at
most h-1 times. Eventually we get to building heaps with 1 values, which as a
base case costs 0 comparisons.

Thus the number of comparison pairs is just two times the number needed to
build the smaller (by one depth) heaps plus the maximum number of comparison
pairs (against each pair of children) needed to percolate the value down to its
correct node. We can write this relationship with the following recurrence
equations.

  C(P0) = 0
  C(Ph) = 2 * C(Ph-1) + Ph

We could write these equations as a trivial C++ function to compute C(Ph) as

int C(int Ph) {
  if (Ph == 0)
    return 0;
  else
    return 2*C(Ph-1) + Ph

We can summarize this information as follows.

  Ph  | Size(Ph)  |  C(Ph)
------+-----------+-------------------------
   0  |    1      |     0
   1  |    3      |     1 = 2*  0  + 1
   2  |    7      |     4 = 2*  1  + 2
   3  |   15      |    11 = 2*  4  + 3
   4  |   31      |    26 = 2* 11  + 4
   5  |   63      |    57 = 2* 26  + 5
   6  |  127      |   120 = 2* 57  + 6
   7  |  255      |   247 = 2*120  + 7
   8  |  511      |   502 = 2*257  + 8
 ...  |  ...      |   ...

We see, as Ph gets bigger, Size(Ph) is about the same as C(Ph). Size(Ph) and
C(Ph) seem to grow at about the same rate (compute their ratio).

To a good approximation, increasing the height of Ph by 1 a bit more than
doubles the number of nodes in the tree and requires a bit more than doubling
the number of comparison pairs. In fact, the ratio C(Ph)/C(Ph-1) approaches 2
(from above) as h goes to infinity (do the divisions above).

With higher mathematics, we can solve and write a solution to these recurrence
equations exactly as

  C(Ph) = Size(Ph) - (Ph+1)  =  2^(Ph+1) - 1 - Ph - 1  =  2^(Ph+1) - Ph - 2

(recall Size(Ph) = 2^(Ph+1) - 1)

Try computing a few values of this function and compare them against the values
in the table, computed by the recurence equations. If we assume that this
formula is true for C(Ph) we can show that it is true for C(Ph+1) as well.

  C(Ph+1) = 2*C(Ph) + (Ph+1)
          = 2*(2^(Ph+1) - Ph - 2) + (Ph+1)
          = 2^(Ph+2) - 2Ph - 4 + (Ph+1)
          = 2^(Ph+2) - 2(Ph+1) - 2 + (Ph+1)
          = 2^(Ph+2) - (Ph+1) - 2
          = 2^(Ph+2) - 1 - (Ph+1) - 1
          = Size(Ph+1) - (Ph+2)  which is the formula above for C(Ph+1)

Finally, since N (the number of nodes in Ph) = 2^(Ph+1) - 1. If we can discard
the -1 as insignificant for large N, to simplify things a bit and get

      N ~ 2^(Ph+1)  (approximate because we discard the -1 term)
 Log2 N ~ Ph+1
 Log2 N -1 ~ Ph   therefore Log2 N ~ Ph+1

We can rewrite our solution substituting Long2 N for Ph+1, and Log2 N -1 for Ph

   C(Ph) =  2^(Ph+1) - Ph - 2
as C(N)  ~  2^(Log2 N) - (Log2 N - 1) - 2
         ~  2^(Log2 N) - (Log2 N - 1) - 2
         ~  N - Log2 N - 1

which means C(N) is O(N), because we can drop the Log2 N and constant terms.
Thus, the time the algorithm requires is some constant times.

The algorithm to build small heaps and combine them into bigger and bigger
heaps turns out to be trivial to write when the heaps are stored as an array.
We just percolate each value in the array down, while traversing the array from
the rear to the front. By the time we percolate a value down, it is on top of
two heaps (its children, stored later in the array, are themselves already
heaps).

To be concrete, suppose we are to build a Min-Heap from the following 7 random
values: 4, 7, 3, 5, 2, 6, 1 (7 is the number of values needed for a height 2
perfect tree). I will now arrange these values into a tree and its underlying
array to illustrate how the algorithm works

To start, just put these values, in whatever order they are, into the array
representing the heap. It is NOT a heap yet, because even though it satisfies
the structure property, it doesn't satisfy the order property.

      4
    /   \ 
   7     3
  / \   / \
 5   2 6   1
  
  0   1   2   3   4   5   6  
+---+---+---+---+---+---+---+
| 4 | 7 | 3 | 5 | 2 | 6 | 1 |
+---+---+---+---+---+---+---+

Now, for the values at the deepest depth: 1, 6, 2, and 5 (in that order:
indexes, 6, 5, 4, and 3) percolate them down to become heaps. They are already
leaves, so percolate down immediately stops before comparing their values to
any of their children's values, because they have no children! Thus, the data
structure remains unchanged with a total of 0 comparison pairs.

      4
    /   \ 
   7     3
  / \   / \
 5   2 6   1
  
  0   1   2   3   4   5   6  
+---+---+---+---+---+---+---+
| 4 | 7 | 3 | 5 | 2 | 6 | 1 |
+---+---+---+---+---+---+---+


Now, for the values at the next higher depth: 3 and 7 (in that order: indexes,
2 and 1) percolate them down into their left or right subheaps to become heaps
of height 1. Each requires one comparison pair to find the smaller childr and
one to decide whether to swap (1 and 3 are swapped; 2 and 7 are swapped). So we
have a total of 2 comparison pairs. Notice that 2 and 1 are both root of heaps.

      4
    /   \ 
   2     1
  / \   / \
 5   7 6   3
  
  0   1   2   3   4   5   6  
+---+---+---+---+---+---+---+
| 4 | 2 | 1 | 5 | 7 | 6 | 3 |
+---+---+---+---+---+---+---+


Now, for the value at the next higher depth (the top): 4 (at the root of the
tree in index 1) percolate it down into its left or right subheaps to become
one final heap. It requires two comparison pairs to twice find the smaller of
its children and to decide whether to swap (4 and 1 and then 4 and 3 are
swapped). So we have a total of 4 comparison pairs.

      1
    /   \ 
   2     3
  / \   / \
 5   7 6   4
  
  0   1   2   3   4   5   6  
+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 5 | 7 | 6 | 4 |
+---+---+---+---+---+---+---+


The result is now one heap, satisfying both the order and structure properties.
Notice that we percolated down the values stored at indexes 6, then 5, then 4,
then 3, then 2, then 1, then 0. We built heaps from the bottom to the top,
right to left at each depth.

So, we can compactly state the offline algorithm to build a heap of N values:
percolate down every index in the array starting and the last index (here 6)
and going backwards until first index(always 0).

Simple code and a beautiful result.

I invite you to generate your own random arrays of 7- or 15-values and to
construct heaps by hand simulating this algorithm to construct heaps

Another way to look at analyzing this algorithims is as follows. Imagine Ph is
a perfect heap with nodes at every depth from the root to the height. It has N
nodes. For each node x, it takes C(x) comparison pairs to percolate that node
down to its correct location in Ph.

     +			+
    / \                / \
   /   \	      /   \
  +-----+	     /     \
    Ph		    +-------+ 
                       Ph+1

If we instead built Ph+1, the new N leaves (actually N+1) would require 0
comparisons to percolate down. Each of the N internal nodes would require at
most 1 more comparison pair to percolate down than it needed in Ph. So,
doubling the number of nodes in Ph (from N to 2N+1) requires doubling the
number of comparison pairs (it was N, but is now 2N), which is the signature of
a linear process.

Finally, using the algorithm above, note that about 1/2 the values in a perfect
tree are leaves, so they need 0 comparison pairs; about 1/4 of the values in
the tree (the ones above each pair of leaves) need 1 comparison pairs; about
1/8 of the values in the tree (the ones above those) need 2 comparison pairs.
Thus, at each depth, going upward, the nodes need 1 more comparison pair, but
there are 1/2 as many nodes at that depth as at the next one.

For the example above, 4 nodes require no comparison pairs, 2 nodes require one
comparison pair, and 1 node requires 2 comparison pairs. In a tree with about
1,000,000 values, about 500,000 would require no comparison pairs, about
250,000 would require 1 comparison pair,  about 125,000 would require 2
comparison pairs, ... , 2 would require about 18 comparison pairs, and 1 would
require about 19 comparison pairs.

Although the # of comparison pairs goes up by 1 each time, the number of nodes
requiring that many comparison pairs goes down by a factor of 1/2. The end
result is that very few nodes require lots of comparison pairs and very many
nodes require few comparisons pairs, and the total number of comparison pairs
in an N node heap is bounded by O(N).

In fact, we can prove this directly. The number of comparison pairs (grouped
together by depth d) is given by the following formula, which captures the
idea that as the depth increases, the number of comparisons (h-d) goes down
but the number of nodes requiring that many comparisons (2^d) goes up.

  sum(from d=0, to d=h, of the expression (h-d)*2^d)

For example, for the height 2 tree (7 nodes) in the illustration above, we have
  for d = 0, 2*1  (2 comparison pairs for 1 node  at depth 0)
  for d = 1, 1*2  (1 comparison pair  for 2 nodes at depth 1)
  for d = 2, 0*4  (0 comparison pairs for 4 nodes at depth 2)

for the height 5 tree (63 nodes), we would have
  for d = 0, 5*1  (5 comparison pairs for 1  node  at depth 0)
  for d = 1, 4*2  (4 comparison pairs for 2  nodes at depth 1)
  for d = 2, 3*4  (3 comparison pairs for 4  nodes at depth 2)
  for d = 3, 2*8  (2 comparison pairs for 8  nodes at depth 3)
  for d = 4, 1*16 (1 comparison pairs for 16 nodes at depth 4)
  for d = 5, 0*32 (0 comparison pairs for 32 nodes at depth 5)
  
Now let us solve the formula. We can rewrite it as follows:
  multiplying by 1 = 2^h/2^h                                        (1)
  factoring out 2^h, which does not depend on d                     (2)
  changing * 2^(d-h) in the numerator to 2^(h-d) in the denominator (3)
  changing the summation variable from d to i, where i = h-d        (4)
   (so d going 0 to h is like i going h to 0, which  we reverse to 0 to h)

  sum(from d=0, to d=h, of the expression 2^h*(h-d)*2^d/2^h)   =   (1)
  2^h * sum(from d=0, to d=h, of the expression (h-d)*2^d/2^h) =   (2)
  2^h * sum(from d=0, to d=h, of the expression (h-d)*2^(d-h)) = 
  2^h * sum(from d=0, to d=h, of the expression (h-d)/2^(h-d)) =   (3)
  2^h * sum(from i=0, to i=h, of the expression i/2^i)             (4)

This is an interesting sum: its numerator increases slowly, an its denominator
increases quickly. Let us first note that 

  sum(from i=0, to i=h, of the expression i/2^i)

is equal to (because when i = 0 the term is 0)

  sum(from i=1 to i=h, of the expression i/2^i)

which is < the following infinite sum (because it has even more positive terms)

  sum(from i=1, to i=infinity, of the expression i/2^i)

Now, we should know the following infinite sum (the top one; each successive one
subtracts the leading term (on the lect) from each side of the =

  1/2 + 1/4 + 1/8 + 1/16 + 1/32 + 1/64 + .... = 1
        1/4 + 1/8 + 1/16 + 1/32 + 1/64 + .... = 1/2
              1/8 + 1/16 + 1/32 + 1/64 + .... = 1/4
                  + 1/16 + 1/32 + 1/64 + .... = 1/8
                           1/32 + 1/64 + .... = 1/16
                                  1/64 + .... = 1/32
                                  ....        = ....

If we add on the left and the right we have

 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + ... = 1 + 1/2 + 1/4 + 1/8 + ...
 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + ... = 2

Notice the sum on the left side is

  sum(from i=1, to i=infinity, of the expression i/2^i)

So, 2^h * sum(from i=0, to d=h, of the expression i/2^i) < 2^h * 2 = 2^(h+1)
Finally, Ph has 2^(h+1) - 1 nodes, so the sum can also be written as N+1. So
the number of comparison pairs needed to build a Ph (an N node heap) O(N).

So, sorting with a heap is still O(N Log2 N), but building the heap should be
faster. The complexity to build a heap (offline) and remove all its values 
is now O(N) + O(N Log2 N) which is still O(N Log2 N), but the algorithm should
run faster.

------------------------------------------------------------------------------

Generalizing Heaps: d-heaps

We can generalize binary heaps to d-heaps, where d represents the maximum
number of children of any node. So, the heaps representing the binary trees
that we have just studied can also be called 2-heaps.

The ordering property in d-heaps remains the same, and the structure property
too, although we need to apply it with more children. A 3-heap with 9 nodes
must have the following structure.

                   A
                /  |     \  
           /       |          \ 
    B              C              D
  / |  \         / |
E   F   G      H   I

Note that an array storing this heap would look as follows.

  0   1   2   3   4   5   6   7   8
+---+---+---+---+---+---+---+---+---+
| A | B | C | D | E | F | G | H | I |
+---+---+---+---+---+---+---+---+---+

with the access functions
  1stChild(i) = 3i+1
  2ndChild(i) = 3i+2
  3rdChild(i) = 3i+3
  parent  (i) = (i-1)/3, using truncating division

The storage scheme and access functions easily generalize to 4-heaps, 5-heaps,
etc. In fact, given a d-heap we can write just two functions that each have
d as a parameter

  nthChild(n,i,d) = d*i+n where 1<=n<=d: this is the nth child of i in a d-heap
  parent  (i,d)   = (i-1)/d: parent of i in a d-heap using truncating division

The percolate up operation is the exactly the same: switching a child and its
parent. Its complexity is O(Logd N), aka Log base d of N. The percolate down
operation is a bit more complicated because in a Min-d-Heap each node must be
swapped with the minimum of its children (if it is bigger than any child).
This requires d operations at each node (to find the minimum child), so at most
O(d Logd N); but since d is a constant for any given heap, the complexity class
is still logarithmic, with base d.

How much difference is there between logs of different bases? We can compute
some numbers easily for Log4 N because Log4 N = .5 Log2 N
So Log2 1,000 ~ 10 and Log4 1,000 ~ 5; Log2 10^6 ~ 20 and Log4 10^6 ~ 10.
but remember that each node percolated down would require 3 comparisons instead
of 1 to determine which of its children is smallest. So, for percolating down
the time would probably be about the same, but percolate up would be twice as
fast.

Think of the case where d is huge: imagine d = 1,000,000. If our priority queue
had fewer than a million values, the result would be a tree that has height 1.
Doing a percolate up would be trivial, requiring just one comparison; but doing
a percolate down would be very expensive, requiring up to a million
comparisions to find the smallest value at depth 1. Of course, with this d, as
N->inf, the number of comparisons for each percolation down would stay at
1,000,000.

------------------------------------------------------------------------------

Merging and Changing Priority:

Finally, there is one operation common to priority queues that heaps do not
do optimally: merging two priority queues into one. A simple way to do this
for any heap implementation is just add every value from one heap to another:
this would be O(N Log2 2N) = O(N + N Log2 N) = O(N Log2 N) operations (assuming
each heap had N values). Of course O(N Log2 N) is good complexity class, but
other more complicated priority queue data structures have a better complexity
class for this operation (and similar complexity classes for add and removemin).

Another way to merge is to put both heaps into an array and then use the
offline technique above to build a heap from their contents. This is an O(2N)
operations = O(N) operations: putting the two heaps in an array big enough to
hold both is O(2N) = O(N) and then doing the offline heap construction
algorithm is also O(2N) = O(N), so the resulting complexity is O(N), an
improvement over enqueuing all the values from one priority queue into the
other, if the sizes are approximately the same.

But (adding one more wrinkle) if the priority queues are vastly different
sizes (one M, one N with M << N), it is probably better to enqueue all the
values from the smaller queue into the bigger one, requiring O(M Log (N+M))
- better than O(N+M) when M << N - and with M << N, O(M Log (N+M)) is about
O(M Log N) and O(N+M) is about O(N). For M = 10 and N = 10^6 the difference is
about 200 compared to about 10^6.

There are more advanced/complicated data structures for implementating priority
queues that allows merge operations more quickly (while still quickly adding
and removing the min): leftist, skew, binomial, fibonacci, etc. heaps. The key
is to create order and structure properties that constrain the data enough (but
not too much) to allow all the operations to work, and work quickly. You can
read about these data structures online or in data structures textbooks.

Another useful operation is changing/updating the priority of a value in a
priority queue. In a heap we would have to percolate such a value up or down,
depending on how its priority changed. That is just an O(Log N) operations.

The hard part is finding the index of the value in the heap to percolate,
because we would have to scan the heap (as an array) trying to find the value:
that is O(N), so it dominates the O(Log N) percolation. The order property of
a heap does not help us to "find" a values that is stored in the heap: if the
value is not at the root, we might have to look at nodes in both its left and
right subtrees (heaps are different than BSTs).

If we instead used a map (from values to indexes in the heap) to tell us where
each value in the heap was stored, that could speed-up the process: if we
stored the map as a BST, finding the index of a value is likely O(Log N)
- assuming the tree is well balanced; later we will see that when storing a map
in a hash table, finding the index of a value has an even better complexity
class: O(1). In both cases finding and percolating the value would be O(Log N).

In some quarter I have students modify their HeapPriorityQueue implementations
to allow updating the value in a node and re-percolating it to the correct
position in the tree.