Binary search trees

An extended binary tree is obtained from a normal binary tree (each node has 0, 1, or 2 sons and is depicted as a "circle node") by replacing null pointers with pointers to square nodes.

The square nodes are the external (or leaf) nodes of the extended binary tree.   The original (cicrcle) nodes are the internal nodes of the extended binary tree.

I = internal path length = ∑internal node i length( path from Root to Keyi )
    = total number of key comparisons during insertion process that created tree T

If all nodes are equally likely to be looked for, the average number of comparisons during a successful search is 1 + I/n.

E = external path length = ∑external node j length( path from Root to external node j )

If all intervals are equally likely to be looked for, the average number of comparisons during an unsuccessful search is E/(n+1).

Theorem.   E = I + 2n.

Easy proof by induction (left for the reader).

Average value of I

What is the average value of the internal path length?

The worst-case is O(n2), and the best-case is O(n log n).

Let xn = the average value of I for trees of n nodes, assuming that all n! permutations of the input values to create the tree are equally likely.

Let i be the first key inserted, which will be in node Root, the root of the created tree, and will have two subtrees, Left and Right, having internal paths (as measured from sons of Root) ILeft and IRight.

I = ILeft + IRight + n - 1,
because each of the n-1 nodes need one more edge to measure lengths of paths from Root.

The average value of ILeft = xi-1.
The average value of IRight = xn-i.

x0 = 0

For n ≥ 1,
xn = (1/n) (i = 1 to n ( xi-1 ) + ∑i = 1 to n ( xn-i ) ) + n-1
    = (2/n) ∑i = 0 to n-1 ( xi ) + n-1

This recurrence is very similar to the Quicksort recurrence.

Solution:   xn = 2(n+1)Hn - 4n     ~ 2n ln n

As a result, the average value of E = I + 2n = 2(n+1)Hn - 2n.

Another way to derive this formula:

(For yet another way, see Knuth vol.3, 2nd ed., p.427 or Standish pp.104-5.)

Let dn = average distance to leaf = avg(E)/(n+1).   d0 = 0.

If we add a new key in all the various ways, in each tree we replace a leaf at depth len with two leaves at depth len+1, thereby increasing the external path length by len+2.

Averaging over all trees with n-1 internal nodes,
avg(En)   =   avg(En-1) + avg(len) + 2   =   avg(En-1) + dn-1 + 2

dn   =   avg(En)/(n+1)   =   (ndn-1 + dn-1 + 2) / (n+1)   =   dn-1 + 2/(n+1)

So, dn   =   ∑i = 1 to n 2/(i+1)   =   2∑i = 2 to n+1 1/i   =   2Hn+1 - 2

Finally, avg(En)   =   (n+1)dn   =   2(n+1)Hn+1 - 2(n+1)   =   2(n+1)Hn - 2n


Dan Hirschberg
Last modified: Oct 28, 2003