Data Compression

Adaptive Huffman coding was first conceived independently by Faller and Gallager [Faller 1973; Gallager 1978]. Knuth contributed improvements to the original algorithm [Knuth 1985] and the resulting algorithm is referred to as algorithm FGK. A more recent version of adaptive Huffman coding is described by Vitter [Vitter 1987]. All of these methods are defined-word schemes which determine the mapping from source messages to codewords based upon a running estimate of the source message probabilities. The code is adaptive, changing so as to remain optimal for the current estimates. In this way, the adaptive Huffman codes respond to locality. In essence, the encoder is "learning" the characteristics of the source. The decoder must learn along with the encoder by continually updating the Huffman tree so as to stay in synchronization with the encoder.

Another advantage of these systems is that they require only one pass over the data. Of course, one-pass methods are not very interesting if the number of bits they transmit is significantly greater than that of the two-pass scheme. Interestingly, the performance of these methods, in terms of number of bits transmitted, can be better than that of static Huffman coding. This does not contradict the optimality of the static method as the static method is optimal only over all methods which assume a time-invariant mapping. The performance of the adaptive methods can also be worse than that of the static method. Upper bounds on the redundancy of these methods are presented in this section. As discussed in the introduction, the adaptive method of Faller, Gallager and Knuth is the basis for the UNIX utility compact. The performance of compact is quite good, providing typical compression factors of 30-40%.

4.1 Algorithm FGK

The basis for algorithm FGK is the Sibling Property, defined by Gallager [Gallager 1978]: A binary code tree has the sibling property if each node (except the root) has a sibling and if the nodes can be listed in order of nonincreasing weight with each node adjacent to its sibling. Gallager proves that a binary prefix code is a Huffman code if and only if the code tree has the sibling property. In algorithm FGK, both sender and receiver maintain dynamically changing Huffman code trees. The leaves of the code tree represent the source messages and the weights of the leaves represent frequency counts for the messages. At any point in time, k of the n possible source messages have occurred in the message ensemble.

Figure 4.1 -- Algorithm FGK processing the ensemble EXAMPLE (a) Tree after processing "aa bb"; 11 will be transmitted for the next b. (b) After encoding the third b; 101 will be transmitted for the next space; the tree will not change; 100 will be transmitted for the first c. (c) Tree after update following first c.

Initially, the code tree consists of a single leaf node, called the 0-node. The 0-node is a special node used to represent the n-k unused messages. For each message transmitted, both parties must increment the corresponding weight and recompute the code tree to maintain the sibling property. At the point in time when t messages have been transmitted, k of them distinct, and k < n, the tree is a legal Huffman code tree with k+1 leaves, one for each of the k messages and one for the 0-node. If the (t+1)st message is one of the k already seen, the algorithm transmits a(t+1)'s current code, increments the appropriate counter and recomputes the tree. If an unused message occurs, the 0-node is split to create a pair of leaves, one for a(t+1), and a sibling which is the new 0-node. Again the tree is recomputed. In this case, the code for the 0-node is sent; in addition, the receiver must be told which of the n-k unused messages has appeared. At each node a count of occurrences of the corresponding message is stored. Nodes are numbered indicating their position in the sibling property ordering. The updating of the tree can be done in a single traversal from the a(t+1) node to the root. This traversal must increment the count for the a(t+1) node and for each of its ancestors. Nodes may be exchanged to maintain the sibling property, but all of these exchanges involve a node on the path from a(t+1) to the root. Figure 4.2 shows the final code tree formed by this process on the ensemble EXAMPLE.

Figure 4.2 -- Tree formed by algorithm FGK for ensemble EXAMPLE.

Disregarding overhead, the number of bits transmitted by algorithm FGK for the EXAMPLE is 129. The static Huffman algorithm would transmit 117 bits in processing the same data. The overhead associated with the adaptive method is actually less than that of the static algorithm. In the adaptive case the only overhead is the n lg n bits needed to represent each of the n different source messages when they appear for the first time. (This is in fact conservative; rather than transmitting a unique code for each of the n source messages, the sender could transmit the message's position in the list of remaining messages and save a few bits in the average case.) In the static case, the source messages need to be sent as does the shape of the code tree. As discussed in Section 3.2, an efficient representation of the tree shape requires 2n bits. Algorithm FGK compares well with static Huffman coding on this ensemble when overhead is taken into account. Figure 4.3 illustrates an example on which algorithm FGK performs better than static Huffman coding even without taking overhead into account. Algorithm FGK transmits 47 bits for this ensemble while the static Huffman code requires 53.

Figure 4.3 -- Tree formed by algorithm FGK for ensemble "e eae de eabe eae dcf".

Vitter has proved that the total number of bits transmitted by algorithm FGK for a message ensemble of length t containing n distinct messages is bounded below by S - n + 1, where S is the performance of the static method, and bounded above by 2S + t - 4n + 2 [Vitter 1987]. So the performance of algorithm FGK is never much worse than twice optimal. Knuth provides a complete implementation of algorithm FGK and a proof that the time required for each encoding or decoding operation is O(l), where l is the current length of the codeword [Knuth 1985]. It should be noted that since the mapping is defined dynamically, during transmission, the encoding and decoding algorithms stand alone; there is no additional algorithm to determine the mapping as in static methods.

4.2 Algorithm V

Figure 4.4 -- FGK tree with non-level order numbering.

The adaptive Huffman algorithm of Vitter (algorithm V) incorporates two improvements over algorithm FGK. First, the number of interchanges in which a node is moved upward in the tree during a recomputation is limited to one. This number is bounded in algorithm FGK only by l/2 where l is the length of the codeword for a(t+1) when the recomputation begins. Second, Vitter's method minimizes the values of SUM{ l(i) } and MAX{l(i)} subject to the requirement of minimizing SUM{ w(i) l(i) }. The intuitive explanation of algorithm V's advantage over algorithm FGK is as follows: as in algorithm FGK, the code tree constructed by algorithm V is the Huffman code tree for the prefix of the ensemble seen so far. The adaptive methods do not assume that the relative frequencies of a prefix represent accurately the symbol probabilities over the entire message. Therefore, the fact that algorithm V guarantees a tree of minimum height (height = MAX{ l(i) } and minimum external path length (SUM{ l(i) }) implies that it is better suited for coding the next message of the ensemble, given that any of the leaves of the tree may represent that next message.

These improvements are accomplished through the use of a new system for numbering nodes. The numbering, called an implicit numbering, corresponds to a level ordering of the nodes (from bottom to top and left to right). Figure 4.4 illiustrates that the numbering of algorithm FGK is not always a level ordering. The following invariant is maintained in Vitter's algorithm: For each weight w, all leaves of weight w precede (in the implicit numbering) all internal nodes of weight w. Vitter proves that this invariant enforces the desired bound on node promotions [Vitter 1987]. The invariant also implements bottom merging, as discussed in Section 3.2, to minimize SUM{ l(i) } and MAX{ l(i) }. The difference between Vitter's method and algorithm FGK is in the way the tree is updated between transmissions. In order to understand the revised update operation, the following definition of a block of nodes is necessary: Blocks are equivalence classes of nodes defined by u is equivalent to v iff weight(u) = weight(v) and u and v are either both leaves or both internal nodes. The leader of a block is the highest-numbered (in the implicit numbering) node in the block. Blocks are ordered by increasing weight with the convention that a leaf block always precedes an internal block of the same weight. When an exchange of nodes is required to maintain the sibling property, algorithm V requires that the node being promoted be moved to the position currently occupied by the highest-numbered node in the target block.

In Figure 4.5, the Vitter tree corresponding to Figure 4.1c is shown. This is the first point in EXAMPLE at which algorithm FGK and algorithm V differ significantly. At this point, the Vitter tree has height 3 and external path length 12 while the FGK tree has height 4 and external path length 14. Algorithm V transmits codeword 001 for the second c; FGK transmits 1101. This demonstrates the intuition given earlier that algorithm V is better suited for coding the next message. The Vitter tree corresponding to Figure 4.2, representing the final tree produced in processing EXAMPLE, is only different from Figure 4.2 in that the internal node of weight 5 is to the right of both leaf nodes of weight 5. Algorithm V transmits 124 bits in processing EXAMPLE, as compared with the 129 bits of algorithm FGK and 117 bits of static Huffman coding. It should be noted that these figures do not include overhead and, as a result, disadvantage the adaptive methods.

Figure 4.5 -- Algorithm V processing the ensemble "aa bbb c".

Figure 4.6 ilustrates the tree built by Vitter's method for the ensemble of Figure 4.3. Both SUM{l(i)} and MAX{l(i)} are smaller in the tree of Figure 4.6. The number of bits transmitted during the processing of the sequence is 47, the same used by algorithm FGK. However, if the transmission continues with d,b,c,f or an unused letter, the cost of algorithm V will be less than that of algorithm FGK. This again illustrates the benefit of minimizing the external path length SUM{l(i)} and the height MAX{l(i)}.

Figure 4.6 -- Tree formed by algorithm V for the ensemble of Fig. 4.3.

It should be noted again that the strategy of minimizing external path length and height is optimal under the assumption that any source letter is equally likely to occur next. Other reasonable strategies include one which assumes locality. To take advantage of locality, the ordering of tree nodes with equal weights could be determined on the basis of recency. Another reasonable assumption about adaptive coding is that the weights in the current tree correspond closely to the probabilities associated with the source. This assumption becomes more reasonable as the length of the ensemble increases. Under this assumption, the expected cost of transmitting the next letter is SUM{ p(i) l(i) } which is approximately SUM{ w(i) l(i) }, so that neither algorithm FGK nor algorithm V has any advantage.

Vitter proves that the performance of his algorithm is bounded by S - n + 1 from below and S + t - 2n + 1 from above [Vitter 1987]. At worst then, Vitter's adaptive method may transmit one more bit per codeword than the static Huffman method. The improvements made by Vitter do not change the complexity of the algorithm; algorithm V encodes and decodes in O(l) time as does algorithm FGK.