The square nodes are the external (or leaf) nodes of the extended binary tree. The original (cicrcle) nodes are the internal nodes of the extended binary tree.
I = internal path length =
∑internal node i
length( path from Root to Keyi )
= total number of key comparisons during insertion
process that created tree T
If all nodes are equally likely to be looked for, the average number of comparisons during a successful search is 1 + I/n.
E = external path length = ∑external node j length( path from Root to external node j )
If all intervals are equally likely to be looked for, the average number of comparisons during an unsuccessful search is E/(n+1).
Theorem. E = I + 2n.
Easy proof by induction (left for the reader).
The worst-case is O(n2), and the best-case is O(n log n).
Let xn = the average value of I for trees of n nodes, assuming that all n! permutations of the input values to create the tree are equally likely.
Let i be the first key inserted, which will be in node Root, the root of the created tree, and will have two subtrees, Left and Right, having internal paths (as measured from sons of Root) ILeft and IRight.
I =
ILeft + IRight + n - 1,
because each of the n-1 nodes need one more edge to
measure lengths of paths from Root.
The average value of
ILeft = xi-1.
The average value of
IRight = xn-i.
x0 = 0
For n ≥ 1,
xn = (1/n) (
∑i = 1 to n
( xi-1 ) +
∑i = 1 to n
( xn-i ) )
+ n-1
= (2/n)
∑i = 0 to n-1
( xi )
+ n-1
This recurrence is very similar to the Quicksort recurrence.
Solution: xn = 2(n+1)Hn - 4n ~ 2n ln n
As a result, the average value of E = I + 2n = 2(n+1)Hn - 2n.
Another way to derive this formula:
(For yet another way, see Knuth vol.3, 2nd ed., p.427 or Standish pp.104-5.)
Let dn = average distance to leaf = avg(E)/(n+1). d0 = 0.
If we add a new key in all the various ways, in each tree we replace a leaf at depth len with two leaves at depth len+1, thereby increasing the external path length by len+2.
Averaging over all trees with n-1 internal nodes,
avg(En)
= avg(En-1) + avg(len) + 2
= avg(En-1) + dn-1 + 2
dn = avg(En)/(n+1) = (ndn-1 + dn-1 + 2) / (n+1) = dn-1 + 2/(n+1)
So, dn = ∑i = 1 to n 2/(i+1) = 2∑i = 2 to n+1 1/i = 2Hn+1 - 2
Finally, avg(En) = (n+1)dn = 2(n+1)Hn+1 - 2(n+1) = 2(n+1)Hn - 2n