Adaptive Huffman coding was first conceived independently by Faller and Gallager [Faller 1973; Gallager 1978]. Knuth contributed improvements to the original algorithm [Knuth 1985] and the resulting algorithm is referred to as algorithm FGK. A more recent version of adaptive Huffman coding is described by Vitter [Vitter 1987]. All of these methods are defined-word schemes which determine the mapping from source messages to codewords based upon a running estimate of the source message probabilities. The code is adaptive, changing so as to remain optimal for the current estimates. In this way, the adaptive Huffman codes respond to locality. In essence, the encoder is "learning" the characteristics of the source. The decoder must learn along with the encoder by continually updating the Huffman tree so as to stay in synchronization with the encoder.

Another advantage of these systems is that
they require only one pass over
the data. Of course, one-pass methods are not very interesting if
the number of bits they transmit is significantly greater than that
of the two-pass scheme. Interestingly, the performance of these methods, in terms
of number of bits transmitted, can be better than that of static
Huffman coding. This does not contradict the optimality of the
static method as the static method is optimal only over all methods which
assume a time-invariant mapping. The performance of the adaptive
methods can also be worse than that of the static method. Upper
bounds on the redundancy of these methods are presented in this section.
As discussed in the introduction, the adaptive method of Faller, Gallager
and Knuth is the basis for the UNIX utility *compact*. The
performance of *compact* is quite good, providing typical compression
factors of 30-40%.

Figure 4.1 -- Algorithm FGK processing the ensemble
`EXAMPLE` (a) Tree after processing "`aa bb`";
11 will be transmitted for the next `b`.
(b) After encoding the third `b`;
101 will be transmitted for the next `space`;
the tree will not change;
100 will be transmitted for the first `c`.
(c) Tree after update following first `c`.

Initially, the code tree consists of a single
leaf node, called the 0-node. The 0-node is a special node used
to represent the `n`-`k` unused messages. For each message transmitted,
both parties must increment the corresponding weight and recompute
the code tree to maintain the sibling property. At the point in
time when `t` messages have been transmitted, `k` of them distinct,
and `k` < `n`, the tree is a legal Huffman code tree with `k`+1 leaves,
one for each of the `k` messages and one for the 0-node. If the
(`t`+1)st message is one of the `k` already seen, the algorithm
transmits `a`(`t`+1)'s current code, increments the appropriate
counter and recomputes the tree. If an unused message occurs,
the 0-node is split to create a pair of leaves, one for `a`(`t`+1),
and a sibling which is the new 0-node. Again the tree is recomputed.
In this case, the code for the 0-node is sent; in addition, the
receiver must be told which of the `n`-`k` unused messages has
appeared.
At each node a count of occurrences of the corresponding
message is stored. Nodes are numbered indicating their position in
the sibling property ordering. The updating of the tree can be
done in a single traversal from the `a`(`t`+1) node to the root.
This traversal must increment the count for the `a`(`t`+1) node
and for each of its ancestors. Nodes may be exchanged to maintain
the sibling property, but all of these exchanges involve a
node on the path from `a`(`t`+1) to the root.
Figure 4.2 shows the final code tree formed by this process on
the ensemble `EXAMPLE`.

Figure 4.2 -- Tree formed by algorithm FGK for ensemble `EXAMPLE`.

Disregarding overhead, the number of bits transmitted by
algorithm FGK for the `EXAMPLE` is 129. The
static Huffman algorithm would transmit 117 bits in processing the
same data. The overhead
associated with the adaptive method is actually less than that
of the static algorithm. In the adaptive case the only overhead
is the `n` lg `n` bits needed to represent each of the `n` different
source messages when they appear for the first time. (This is in fact
conservative; rather than transmitting a unique code for each of the
`n` source messages, the sender could transmit the message's position
in the list of remaining messages and save a few bits in the average
case.) In the static case, the source messages need to be
sent as does the shape of the code tree. As discussed in
Section 3.2,
an efficient representation of the tree shape requires 2`n` bits.
Algorithm FGK compares well with static Huffman coding on this ensemble
when overhead is taken into account.
Figure 4.3 illustrates an example on which algorithm FGK performs
better than static Huffman coding even without taking overhead
into account. Algorithm FGK transmits 47 bits for this ensemble
while the static Huffman code requires 53.

Figure 4.3 -- Tree formed by algorithm FGK for ensemble
"`e eae de eabe eae dcf`".

Vitter has proved that
the total number of bits transmitted by algorithm FGK for a message
ensemble of length `t` containing `n` distinct messages is bounded below
by `S - n` + 1, where `S` is the performance of the static method, and
bounded above by 2`S` + `t` - 4`n` + 2 [Vitter 1987]. So the performance
of algorithm FGK is never much worse than twice optimal.
Knuth provides a complete implementation of algorithm FGK
and a proof that the time required for each encoding or decoding
operation is `O`(`l`), where `l` is the current length of the codeword [Knuth 1985].
It should be noted that since the mapping is defined dynamically, during
transmission, the encoding and decoding algorithms stand alone; there is
no additional algorithm to determine the mapping as in static methods.

Figure 4.4 -- FGK tree with non-level order numbering.

The adaptive Huffman algorithm of Vitter (algorithm V)
incorporates two improvements over algorithm FGK. First, the
number of interchanges in which a node is moved upward in the
tree during a recomputation is limited to one. This
number is bounded in algorithm FGK only by l/2 where `l` is the
length of the codeword for `a`(`t`+1) when the recomputation begins.
Second, Vitter's method minimizes the values of SUM{ `l`(`i`) } and
MAX{`l`(`i`)} subject to the requirement of minimizing
SUM{ `w`(`i`) `l`(`i`) }.
The intuitive explanation of algorithm V's advantage over algorithm
FGK is as follows: as in algorithm FGK, the code tree constructed
by algorithm V is the Huffman code tree for the prefix of the
ensemble seen so far. The adaptive methods do not assume that the
relative frequencies of a prefix represent accurately the symbol
probabilities over the entire message. Therefore, the fact that algorithm V
guarantees a tree of minimum height (height = MAX{ `l`(`i`) }
and minimum external path length (SUM{ `l`(`i`) }) implies that it is
better suited for coding the next message of the ensemble, given that
any of the leaves of the tree may represent that next message.

These improvements are accomplished through the use of a new system
for numbering nodes. The numbering, called an implicit numbering,
corresponds to a level ordering of the nodes (from bottom to top
and left to right). Figure 4.4 illiustrates that the numbering of
algorithm FGK is not always a level ordering. The following invariant
is maintained in Vitter's algorithm: For each weight `w`, all leaves
of weight `w` precede (in the implicit numbering) all internal nodes
of weight `w`. Vitter proves that this invariant enforces the
desired bound on node promotions [Vitter 1987]. The invariant
also implements bottom merging, as discussed in
Section 3.2, to
minimize SUM{ `l`(`i`) } and MAX{ `l`(`i`) }. The difference between Vitter's
method and algorithm FGK is in the way the tree is updated between
transmissions. In order to understand the revised update operation,
the following definition of a block of nodes is necessary: Blocks
are equivalence classes of nodes defined by
`u` is equivalent to `v` iff `weight(u)` =
`weight(v)` and `u` and `v` are either both leaves or both internal nodes.
The leader of a block is the highest-numbered (in the implicit numbering)
node in the block. Blocks are ordered by increasing weight with the
convention that a leaf block always precedes an internal block of the
same weight. When an exchange of nodes is required to maintain
the sibling property, algorithm V requires that the node being
promoted be moved to the position currently occupied by the
highest-numbered node in the target block.

In Figure 4.5, the Vitter tree corresponding to Figure 4.1c is
shown. This is the first point in `EXAMPLE` at which
algorithm FGK and algorithm V differ significantly. At this
point, the Vitter tree has height 3 and external path length 12 while
the FGK tree has height 4 and external path length 14. Algorithm V
transmits codeword 001 for the second `c`; FGK transmits 1101.
This demonstrates the intuition given earlier that algorithm V
is better suited for coding the next message.
The Vitter tree corresponding to Figure 4.2, representing the final
tree produced in processing `EXAMPLE`, is only different from
Figure 4.2 in that the internal node of weight 5 is to the right
of both leaf nodes of weight 5.
Algorithm V transmits 124 bits in processing `EXAMPLE`, as compared
with the 129 bits of algorithm FGK and 117 bits of static Huffman
coding. It should be noted that these figures do not include overhead
and, as a result, disadvantage the adaptive methods.

Figure 4.5 -- Algorithm V processing the ensemble "`aa bbb c`".

Figure 4.6 ilustrates the tree built by Vitter's method for the ensemble
of Figure 4.3. Both SUM{`l`(`i`)} and MAX{`l`(`i`)} are
smaller in the tree of Figure 4.6. The number of bits transmitted
during the processing of the sequence is 47, the same used by
algorithm FGK. However, if the transmission continues with `d,b,c,f`
or an unused letter, the cost of algorithm V will be less than
that of algorithm FGK. This again illustrates the benefit of
minimizing the external path length SUM{`l`(`i`)}
and the height MAX{`l`(`i`)}.

Figure 4.6 -- Tree formed by algorithm V for the ensemble of Fig. 4.3.

It should be noted again that the strategy of minimizing external path
length and height is optimal under the assumption that any source
letter is equally likely to occur next. Other reasonable strategies
include one which assumes locality. To take advantage of locality,
the ordering of tree nodes with equal weights could be determined
on the basis of recency.
Another reasonable assumption about adaptive coding
is that the weights in the current tree correspond closely to the
probabilities associated with the source. This assumption becomes
more reasonable as the length of the ensemble increases.
Under this assumption, the expected cost of transmitting the next letter
is SUM{ `p`(`i`) `l`(`i`) } which is approximately
SUM{ `w`(`i`) `l`(`i`) }, so that neither algorithm FGK nor
algorithm V has any advantage.

Vitter proves that the performance of his
algorithm is bounded by `S - n` + 1 from below and `S + t` - 2`n` + 1
from above [Vitter 1987]. At worst then, Vitter's adaptive method
may transmit one more bit per codeword than the static Huffman
method. The improvements made by Vitter do not change the complexity of
the algorithm; algorithm V encodes and decodes in `O`(`l`) time as does
algorithm FGK.