Given an ordered set S =
*a*_{1} <
*a*_{2} < ...
*a*_{n},
we wish to process sequences of MEMBER queries.
We also know the probability of various requests occurring:

*p _{i}* = Prob[ MEMBER(

where

To help analyze the time complexity, we add leaves to the binary search tree wherever we have a null link.

If *x* is the label of node *v*
then cost( MEMBER(*x*,S) ) = 1 + depth(*v*).

If *x* not in set S and
*a _{i}* <

The average time complexity for this tree can be found by summing the costs of accessing a node mutiplied by the probability of that access.

cost(binary search tree T) =
∑_{i = 1 to n}
( *p _{i}* [1 + depth(

__Problem__:
Given the *p*'s and *q*'s, find T to minimize cost.

The divide-and-conquer approach suggests determining which element
belongs at the root and then determining what each of the subtrees
looks like.
There seems to be no easy way of determining what the root should be,
which means that we would have to solve 2*n* subproblems,
as each of the *n* elements could be at the root and
for each choice we must solve the left and right subtrees.
(As an exercise, determine the time complexity of this recursive
approach. Start by giving an explicit recurrence.)
This is too many for recursion, so we use dynamic programming.

For 0 ≤ *i < j ≤ n*, let

T_{i, j} = min cost tree for
problem {*a*_{i+1}...*a _{j}*}

and define weight

T_{i, j} consists of a root containing
*a _{k}*, for some

Also, boundary conditions:

T_{i, i} = the empty tree

*w _{i, i}* =

In T_{i, j}, the depth of all vertices in the
subtrees is precisely 1 more than what the depths were in
subtrees T_{i, k-1} and T_{k, j}.
Therefore,

*c _{i, j}*
= (

=

The optimal T_{i, j} will have root *a _{k}*
that minimizes the sum

__Construction of optimal binary search tree__

fori := 0tondow_{i,i}:= q_{i}c_{i,i}:= 0 r_{i,i}:= 0forlength := 1tondofori := 0ton-lengthdoj := i + length w_{i,j}:= w_{i,j-1}+ p_{j}+ q_{j}m := value of k (with i < k ≤ j) which minimizes (c_{i,k-1}+c_{k,j}) c_{i,j}:= w_{i,j}+ c_{i,m-1}+ c_{m,j}r_{i,j}:= m Leftson(r_{i,j}) := r_{i,m-1}Rightson(r_{i,j}) := r_{m,j}

The time complexity of this algorithm is *O*(*n*^{3}).

Making a slight change
will reduce the complexity to be *O*(*n*^{2}).
(See, for example, Knuth v.3, 2nd ed.,p.436-9 and p.456#27)

Modify the range of considered values of *k*:

iflength=1thenm := jelsem := value of k (with r_{i,j-1}≤ k ≤ r_{i+1,j}) which minimizes (c_{i,k-1}+c_{k,j})

Dan Hirschberg

Computer Science Department

University of California, Irvine, CA 92697-3435

Last modified: Oct 28, 2003