Data Compression

3. STATIC DEFINED-WORD SCHEMES

The classic defined-word scheme was developed over 30 years ago in Huffman's well-known paper on minimum-redundancy coding [Huffman 1952]. Huffman's algorithm provided the first solution to the problem of constructing minimum-redundancy codes. Many people believe that Huffman coding cannot be improved upon, that is, that it is guaranteed to achieve the best possible compression ratio. This is only true, however, under the constraints that each source message is mapped to a unique codeword and that the compressed text is the concatenation of the codewords for the source messages. An earlier algorithm, due independently to Shannon and Fano [Shannon and Weaver 1949; Fano 1949], is not guaranteed to provide optimal codes, but approaches optimal behavior as the number of messages approaches infinity. The Huffman algorithm is also of importance because it has provided a foundation upon which other data compression techniques have built and a benchmark to which they may be compared. We classify the codes generated by the Huffman and Shannon-Fano algorithms as variable-variable and note that they include block-variable codes as a special case, depending upon how the source messages are defined.

In Section 3.3 codes which map the integers onto binary codewords are discussed. Since any finite alphabet may be enumerated, this type of code has general-purpose utility. However, a more common use of these codes (called universal codes) is in conjunction with an adaptive scheme. This connection is discussed in Section 5.2.

Arithmetic coding, presented in Section 3.4, takes a significantly different approach to data compression from that of the other static methods. It does not construct a code, in the sense of a mapping from source messages to codewords. Instead, arithmetic coding replaces the source ensemble by a code string which, unlike all of the other codes discussed here, is not the concatenation of codewords corresponding to individual source messages. Arithmetic coding is capable of achieving compression results which are arbitrarily close to the entropy of the source.

3.1 Shannon-Fano Coding

The Shannon-Fano technique has as an advantage its simplicity. The code is constructed as follows: the source messages a(i) and their probabilities p( a(i) ) are listed in order of nonincreasing probability. This list is then divided in such a way as to form two groups of as nearly equal total probabilities as possible. Each message in the first group receives 0 as the first digit of its codeword; the messages in the second half have codewords beginning with 1. Each of these groups is then divided according to the same criterion and additional code digits are appended. The process is continued until each subset contains only one message. Clearly the Shannon-Fano algorithm yields a minimal prefix code.

a    1/2     0
b    1/4     10
c    1/8     110
d    1/16    1110
e    1/32    11110
f    1/32    11111

Figure 3.1 -- A Shannon-Fano Code.

Figure 3.1 shows the application of the method to a particularly simple probability distribution. The length of each codeword x is equal to -lg p(x). This is true as long as it is possible to divide the list into subgroups of exactly equal probability. When this is not possible, some codewords may be of length -lg p(x)+1. The Shannon-Fano algorithm yields an average codeword length S which satisfies H <= S <= H + 1. In Figure 3.2, the Shannon-Fano code for ensemble EXAMPLE is given. As is often the case, the average codeword length is the same as that achieved by the Huffman code (see Figure 1.3). That the Shannon-Fano algorithm is not guaranteed to produce an optimal code is demonstrated by the following set of probabilities: { .35, .17, .17, .16, .15 }. The Shannon-Fano code for this distribution is compared with the Huffman code in Section 3.2.

g      8/40    00
f      7/40    010
e      6/40    011
d      5/40    100
space  5/40    101
c      4/40    110
b      3/40    1110
a      2/40    1111

Figure 3.2 -- A Shannon-Fano Code for EXAMPLE
              (code length=117).

3.2. Static Huffman Coding

Huffman's algorithm, expressed graphically, takes as input a list of nonnegative weights {w(1), ... ,w(n) } and constructs a full binary tree [a binary tree is full if every node has either zero or two children] whose leaves are labeled with the weights. When the Huffman algorithm is used to construct a code, the weights represent the probabilities associated with the source letters. Initially there is a set of singleton trees, one for each weight in the list. At each step in the algorithm the trees corresponding to the two smallest weights, w(i) and w(j), are merged into a new tree whose weight is w(i)+w(j) and whose root has two children which are the subtrees represented by w(i) and w(j). The weights w(i) and w(j) are removed from the list and w(i)+w(j) is inserted into the list. This process continues until the weight list contains a single value. If, at any time, there is more than one way to choose a smallest pair of weights, any such pair may be chosen. In Huffman's paper, the process begins with a nonincreasing list of weights. This detail is not important to the correctness of the algorithm, but it does provide a more efficient implementation [Huffman 1952]. The Huffman algorithm is demonstrated in Figure 3.3.

[FIGURE 3.3]

Figure 3.3 -- The Huffman process: (a) The list; (b) the tree.

The Huffman algorithm determines the lengths of the codewords to be mapped to each of the source letters a(i). There are many alternatives for specifying the actual digits; it is necessary only that the code have the prefix property. The usual assignment entails labeling the edge from each parent to its left child with the digit 0 and the edge to the right child with 1. The codeword for each source letter is the sequence of labels along the path from the root to the leaf node representing that letter. The codewords for the source of Figure 3.3, in order of decreasing probability, are {01,11,001,100,101,000,0001}. Clearly, this process yields a minimal prefix code. Further, the algorithm is guaranteed to produce an optimal (minimum redundancy) code [Huffman 1952]. Gallager has proved an upper bound on the redundancy of a Huffman code of p(n) + lg [(2 lg e)/e] which is approximately p(n) + 0.086, where p(n) is the probability of the least likely source message [Gallager 1978]. In a recent paper, Capocelli et al. provide new bounds which are tighter than those of Gallagher for some probability distributions [Capocelli et al. 1986]. Figure 3.4 shows a distribution for which the Huffman code is optimal while the Shannon-Fano code is not.

In addition to the fact that there are many ways of forming codewords of appropriate lengths, there are cases in which the Huffman algorithm does not uniquely determine these lengths due to the arbitrary choice among equal minimum weights. As an example, codes with codeword lengths of {1,2,3,4,4} and of {2,2,2,3,3} both yield the same average codeword length for a source with probabilities {.4,.2,.2,.1,.1}. Schwartz defines a variation of the Huffman algorithm which performs "bottom merging"; that is, orders a new parent node above existing nodes of the same weight and always merges the last two weights in the list. The code constructed is the Huffman code with minimum values of maximum codeword length (MAX{ l(i) }) and total codeword length (SUM{ l(i) }) [Schwartz 1964]. Schwartz and Kallick describe an implementation of Huffman's algorithm with bottom merging [Schwartz and Kallick 1964]. The Schwartz-Kallick algorithm and a later algorithm by Connell [Connell 1973] use Huffman's procedure to determine the lengths of the codewords, and actual digits are assigned so that the code has the numerical sequence property. That is, codewords of equal length form a consecutive sequence of binary numbers. Shannon-Fano codes also have the numerical sequence property. This property can be exploited to achieve a compact representation of the code and rapid encoding and decoding.

                        S-F      Huffman

a(1)        0.35        00       1
a(2)        0.17        01       011
a(3)        0.17        10       010
a(4)        0.16        110      001
a(5)        0.15        111      000

Average codeword length 2.31     2.30

Figure 3.4 -- Comparison of Shannon-Fano and Huffman Codes.

Both the Huffman and the Shannon-Fano mappings can be generated in O(n) time, where n is the number of messages in the source ensemble (assuming that the weights have been presorted). Each of these algorithms maps a source message a(i) with probability p to a codeword of length l (-lg p <= l <= - lg p + 1). Encoding and decoding times depend upon the representation of the mapping. If the mapping is stored as a binary tree, then decoding the codeword for a(i) involves following a path of length l in the tree. A table indexed by the source messages could be used for encoding; the code for a(i) would be stored in position i of the table and encoding time would be O(l). Connell's algorithm makes use of the index of the Huffman code, a representation of the distribution of codeword lengths, to encode and decode in O(c) time where c is the number of different codeword lengths. Tanaka presents an implementation of Huffman coding based on finite-state machines which can be realized efficiently in either hardware or software [Tanaka 1987].

As noted earlier, the redundancy bound for Shannon-Fano codes is 1 and the bound for the Huffman method is p(n + 0.086 where p(n) is the probability of the least likely source message (so p(n) is less than or equal to .5, and generally much less). It is important to note that in defining redundancy to be average codeword length minus entropy, the cost of transmitting the code mapping computed by these algorithms is ignored. The overhead cost for any method where the source alphabet has not been established prior to transmission includes n lg n bits for sending the n source letters. For a Shannon-Fano code, a list of codewords ordered so as to correspond to the source letters could be transmitted. The additional time required is then SUM l(i), where the l(i) are the lengths of the codewords. For Huffman coding, an encoding of the shape of the code tree might be transmitted. Since any full binary tree may be a legal Huffman code tree, encoding tree shape may require as many as lg 4^n = 2n bits. In most cases the message ensemble is very large, so that the number of bits of overhead is minute by comparison to the total length of the encoded transmission. However, it is imprudent to ignore this cost.

If a less-than-optimal code is acceptable, the overhead costs can be avoided through a prior agreement by sender and receiver as to the code mapping. Rather than using a Huffman code based upon the characteristics of the current message ensemble, the code used could be based on statistics for a class of transmissions to which the current ensemble is assumed to belong. That is, both sender and receiver could have access to a codebook with k mappings in it; one for Pascal source, one for English text, etc. The sender would then simply alert the receiver as to which of the common codes he is using. This requires only lg k bits of overhead. Assuming that classes of transmission with relatively stable characteristics could be identified, this hybrid approach would greatly reduce the redundancy due to overhead without significantly increasing expected codeword length. In addition, the cost of computing the mapping would be amortized over all files of a given class. That is, the mapping would be computed once on a statistically significant sample and then used on a great number of files for which the sample is representative. There is clearly a substantial risk associated with assumptions about file characteristics and great care would be necessary in choosing both the sample from which the mapping is to be derived and the categories into which to partition transmissions. An extreme example of the risk associated with the codebook approach is provided by author Ernest V. Wright who wrote a novel Gadsby (1939) containing no occurrences of the letter E. Since E is the most commonly used letter in the English language, an encoding based upon a sample from Gadsby would be disastrous if used with "normal" examples of English text. Similarly, the "normal" encoding would provide poor compression of Gadsby.

McIntyre and Pechura describe an experiment in which the codebook approach is compared to static Huffman coding [McIntyre and Pechura 1985]. The sample used for comparison is a collection of 530 source programs in four languages. The codebook contains a Pascal code tree, a FORTRAN code tree, a COBOL code tree, a PL/1 code tree, and an ALL code tree. The Pascal code tree is the result of applying the static Huffman algorithm to the combined character frequencies of all of the Pascal programs in the sample. The ALL code tree is based upon the combined character frequencies for all of the programs. The experiment involves encoding each of the programs using the five codes in the codebook and the static Huffman algorithm. The data reported for each of the 530 programs consists of the size of the coded program for each of the five predetermined codes, and the size of the coded program plus the size of the mapping (in table form) for the static Huffman method. In every case, the code tree for the language class to which the program belongs generates the most compact encoding. Although using the Huffman algorithm on the program itself yields an optimal mapping, the overhead cost is greater than the added redundancy incurred by the less-than-optimal code. In many cases, the ALL code tree also generates a more compact encoding than the static Huffman algorithm. In the worst case, an encoding constructed from the codebook is only 6.6% larger than that constructed by the Huffman algorithm. These results suggest that, for files of source code, the codebook approach may be appropriate.

Gilbert discusses the construction of Huffman codes based on inaccurate source probabilities [Gilbert 1971]. A simple solution to the problem of incomplete knowledge of the source is to avoid long codewords, thereby minimizing the error of underestimating badly the probability of a message. The problem becomes one of constructing the optimal binary tree subject to a height restriction (see [Knuth 1971; Hu and Tan 1972; Garey 1974]). Another approach involves collecting statistics for several sources and then constructing a code based upon some combined criterion. This approach could be applied to the problem of designing a single code for use with English, French, German, etc., sources. To accomplish this, Huffman's algorithm could be used to minimize either the average codeword length for the combined source probabilities; or the average codeword length for English, subject to constraints on average codeword lengths for the other sources.

3.3 Universal Codes and Representations of the Integers

A code is universal if it maps source messages to codewords so that the resulting average codeword length is bounded by c1(H + c2). That is, given an arbitrary source with nonzero entropy, a universal code achieves average codeword length which is at most a constant times the optimal possible for that source. The potential compression offered by a universal code clearly depends on the magnitudes of the constants c1 and c2. We recall the definition of an asymptotically optimal code as one for which average codeword length approaches entropy and remark that a universal code with c1=1 is asymptotically optimal.

An advantage of universal codes over Huffman codes is that it is not necessary to know the exact probabilities with which the source messages appear. While Huffman coding is not applicable unless the probabilities are known, it is sufficient in the case of universal coding to know the probability distribution only to the extent that the source messages can be ranked in probability order. By mapping messages in order of decreasing probability to codewords in order of increasing length, universality can be achieved. Another advantage to universal codes is that the codeword sets are fixed. It is not necessary to compute a codeword set based upon the statistics of an ensemble; any universal codeword set will suffice as long as the source messages are ranked. The encoding and decoding processes are thus simplified. While universal codes can be used instead of Huffman codes as general-purpose static schemes, the more common application is as an adjunct to a dynamic scheme. This type of application will be demonstrated in Section 5.

Since the ranking of source messages is the essential parameter in universal coding, we may think of a universal code as representing an enumeration of the source messages, or as representing the integers, which provide an enumeration. Elias defines a sequence of universal coding schemes which map the set of positive integers onto the set of binary codewords [Elias 1975].

      gamma          delta

1     1              1
2     010            0100
3     011            0101
4     00100          01100
5     00101          01101
6     00110          01110
7     00111          01111
8     0001000        00100000
16    000010000      001010000
17    000010001      001010001
32    00000100000    0011000000

Figure 3.5 -- Elias Codes.

The first Elias code is one which is simple but not optimal. This code, gamma, maps an integer x onto the binary value of x prefaced by floor( lg x) zeros. The binary value of x is expressed in as few bits as possible, and therefore begins with a 1, which serves to delimit the prefix. The result is an instantaneously decodable code since the total length of a codeword is exactly one greater than twice the number of zeros in the prefix; therefore, as soon as the first 1 of a codeword is encountered, its length is known. The code is not a minimum redundancy code since the ratio of expected codeword length to entropy goes to 2 as entropy approaches infinity. The second code, delta, maps an integer x to a codeword consisting of gamma (floor[lg x] +1) followed by the binary value of x with the leading 1 deleted. The resulting codeword has length floor[lg x] + 2 floor[lg (1 + floor[lg x] ) ] + 1. This concept can be applied recursively to shorten the codeword lengths, but the benefits decrease rapidly. The code delta is asymptotically optimal since the limit of the ratio of expected codeword length to entropy is 1. Figure 3.5 lists the values of gamma and delta for a sampling of the integers. Figure 3.6 shows an Elias code for string EXAMPLE. The number of bits transmitted using this mapping would be 161, which does not compare well with the 117 bits transmitted by the Huffman code of Figure 1.3. Huffman coding is optimal under the static mapping model. Even an asymptotically optimal universal code cannot compare with static Huffman coding on a source for which the probabilities of the messages are known.

Source   Frequency   Rank       Codeword
message

g           8          1        delta(1)=1
f           7          2        delta(2)=0100
e           6          3        delta(3)=0101
d           5          4        delta(4)=01100
space       5          5        delta(5)=01101
c           4          6        delta(6)=01110
b           3          7        delta(7)=01111
a           2          8        delta(8)=00100000

Figure 3.6 -- An Elias Code for EXAMPLE (code length=161).

A second sequence of universal coding schemes, based on the Fibonacci numbers, is defined by Apostolico and Fraenkel [Apostolico and Fraenkel 1985]. While the Fibonacci codes are not asymptotically optimal, they compare well to the Elias codes as long as the number of source messages is not too large. Fibonacci codes have the additional attribute of robustness, which manifests itself by the local containment of errors. This aspect of Fibonacci codes will be discussed further in Section 7.

The sequence of Fibonacci codes described by Apostolico and Fraenkel is based on the Fibonacci numbers of order m >= 2, where the Fibonacci numbers of order 2 are the standard Fibonacci numbers: 1, 1, 2, 3, 5, 8, 13, .... In general, the Fibonnaci numbers of order m are defined by the recurrence: Fibonacci numbers F(-m+1) through F(0) are equal to 1; the kth number for k >= 1 is the sum of the preceding m numbers. We describe only the order 2 Fibonacci code; the extension to higher orders is straightforward.

N             R(N)               F(N)

 1                           1   11
 2                       1   0   011
 3                   1   0   0   0011
 4                   1   0   1   1011
 5               1   0   0   0   00011
 6               1   0   0   1   10011
 7               1   0   1   0   01011
 8           1   0   0   0   0   000011
16       1   0   0   1   0   0   0010011
32   1   0   1   0   1   0   0   00101011

    21  13   8   5   3   2   1

Figure 3.7 -- Fibonacci Representations and Fibonacci Codes.

Every nonnegative integer N has precisely one binary representation of the form R(N) = SUM(i=0 to k) d(i) F(i) (where d(i) is in {0,1}, k <= N, and the F(i) are the order 2 Fibonacci numbers as defined above) such that there are no adjacent ones in the representation. The Fibonacci representations for a small sampling of the integers are shown in Figure 3.7, using the standard bit sequence, from high order to low. The bottom row of the figure gives the values of the bit positions. It is immediately obvious that this Fibonacci representation does not constitute a prefix code. The order 2 Fibonacci code for N is defined to be: F(N)=D1 where D=d(0)d(1)d(2)...d(k) (the d(i) defined above). That is, the Fibonacci representation is reversed and 1 is appended. The Fibonacci code values for a small subset of the integers are given in Figure 3.7. These binary codewords form a prefix code since every codeword now terminates in two consecutive ones, which cannot appear anywhere else in a codeword.

Fraenkel and Klein prove that the Fibonacci code of order 2 is universal, with c1=2 and c2=3 [Fraenkel and Klein 1985]. It is not asymptotically optimal since c1>1. Fraenkel and Klein also show that Fibonacci codes of higher order compress better than the order 2 code if the source language is large enough (i.e., the number of distinct source messages is large) and the probability distribution is nearly uniform. However, no Fibonacci code is asymptotically optimal. The Elias codeword delta(N) is asymptotically shorter than any Fibonacci codeword for N, but the integers in a very large initial range have shorter Fibonacci codewords. For m=2, for example, the transition point is N=514,228 [Apostolico and Fraenkel 1985]. Thus, a Fibonacci code provides better compression than the Elias code until the size of the source language becomes very large. Figure 3.8 shows a Fibonacci code for string EXAMPLE. The number of bits transmitted using this mapping would be 153, which is an improvement over the Elias code of Figure 3.6 but still compares poorly with the Huffman code of Figure 1.3.

Source   Frequency   Rank       Codeword
message

g           8          1        F(1)=11
f           7          2        F(2)=011
e           6          3        F(3)=0011
d           5          4        F(4)=1011
space       5          5        F(5)=00011
c           4          6        F(6)=10011
b           3          7        F(7)=01011
a           2          8        F(8)=000011

Figure 3.8 -- A Fibonacci Code for EXAMPLE (code length=153).

3.4 Arithmetic Coding

The method of arithmetic coding was suggested by Elias, and presented by Abramson in his text on Information Theory [Abramson 1963]. Implementations of Elias' technique were developed by Rissanen, Pasco, Rubin, and, most recently, Witten et al. [Rissanen 1976; Pasco 1976; Rubin 1979; Witten et al. 1987]. We present the concept of arithmetic coding first and follow with a discussion of implementation details and performance.

In arithmetic coding a source ensemble is represented by an interval between 0 and 1 on the real number line. Each symbol of the ensemble narrows this interval. As the interval becomes smaller, the number of bits needed to specify it grows. Arithmetic coding assumes an explicit probabilistic model of the source. It is a defined-word scheme which uses the probabilities of the source messages to successively narrow the interval used to represent the ensemble. A high probability message narrows the interval less than a low probability message, so that high probability messages contribute fewer bits to the coded ensemble. The method begins with an unordered list of source messages and their probabilities. The number line is partitioned into subintervals based on cumulative probabilities.

A small example will be used to illustrate the idea of arithmetic coding. Given source messages {A,B,C,D,#} with probabilities {.2, .4, .1, .2, .1}, Figure 3.9 demonstrates the initial partitioning of the number line. The symbol A corresponds to the first 1/5 of the interval [0,1); B the next 2/5; D the subinterval of size 1/5 which begins 70% of the way from the left endpoint to the right. When encoding begins, the source ensemble is represented by the entire interval [0,1). For the ensemble AADB#, the first A reduces the interval to [0,.2) and the second A to [0,.04) (the first 1/5 of the previous interval). The D further narrows the interval to [.028,.036) (1/5 of the previous size, beginning 70% of the distance from left to right). The B narrows the interval to [.0296,.0328), and the # yields a final interval of [.03248,.0328). The interval, or alternatively any number i within the interval, may now be used to represent the source ensemble.

Source    Probability   Cumulative    Range
message                 probability

   A            .2          .2        [0,.2)
   B            .4          .6        [.2,.6)
   C            .1          .7        [.6,.7)
   D            .2          .9        [.7,.9)
   #            .1         1.0        [.9,1.0)

Figure 3.9 -- The Arithmetic coding model.

Two equations may be used to define the narrowing process described above:

   newleft = prevleft + msgleft*prevsize      (1)
   newsize = prevsize * msgsize               (2)

The first equation states that the left endpoint of the new interval is calculated from the previous interval and the current source message. The left endpoint of the range associated with the current message specifies what percent of the previous interval to remove from the left in order to form the new interval. For D in the above example, the new left endpoint is moved over by .7 * .04 (70% of the size of the previous interval). The second equation computes the size of the new interval from the previous interval size and the probability of the current message (which is equivalent to the size of its associated range). Thus, the size of the interval determined by D is .04*.2, and the right endpoint is .028+.008=.036 (left endpoint + size).

The size of the final subinterval determines the number of bits needed to specify a number in that range. The number of bits needed to specify a subinterval of [0,1) of size s is -lg s. Since the size of the final subinterval is the product of the probabilities of the source messages in the ensemble (that is, s=PROD{i=1 to N} p(source message i) where N is the length of the ensemble), we have -lg s = -SUM{i=1 to N lg p(source message i) = - SUM{i=1 to n} p(a(i)) lg p(a(i)), where n is the number of unique source messages a(1), a(2), ... a(n). Thus, the number of bits generated by the arithmetic coding technique is exactly equal to entropy, H. This demonstrates the fact that arithmetic coding achieves compression which is almost exactly that predicted by the entropy of the source.

In order to recover the original ensemble, the decoder must know the model of the source used by the encoder (eg., the source messages and associated ranges) and a single number within the interval determined by the encoder. Decoding consists of a series of comparisons of the number i to the ranges representing the source messages. For this example, i might be .0325 (.03248, .0326, or .0327 would all do just as well). The decoder uses i to simulate the actions of the encoder. Since i lies between 0 and .2, he deduces that the first letter was A (since the range [0,.2) corresponds to source message A). This narrows the interval to [0,.2). The decoder can now deduce that the next message will further narrow the interval in one of the following ways: to [0,.04) for A, to [.04,.12) for B, to [.12,.14) for C, to [.14,.18) for D, and to [.18,.2) for #. Since i falls into the interval [0,.04), he knows that the second message is again A. This process continues until the entire ensemble has been recovered.

Several difficulties become evident when implementation of arithmetic coding is attempted. The first is that the decoder needs some way of knowing when to stop. As evidence of this, the number 0 could represent any of the source ensembles A, AA, AAA, etc. Two solutions to this problem have been suggested. One is that the encoder transmit the size of the ensemble as part of the description of the model. Another is that a special symbol be included in the model for the purpose of signaling end of message. The # in the above example serves this purpose. The second alternative is preferable for several reasons. First, sending the size of the ensemble requires a two-pass process and precludes the use of arithmetic coding as part of a hybrid codebook scheme (see Sections 1.2 and 3.2). Secondly, adaptive methods of arithmetic coding are easily developed and a first pass to determine ensemble size is inappropriate in an on-line adaptive scheme.

A second issue left unresolved by the fundamental concept of arithmetic coding is that of incremental transmission and reception. It appears from the above discussion that the encoding algorithm transmits nothing until the final interval is determined. However, this delay is not necessary. As the interval narrows, the leading bits of the left and right endpoints become the same. Any leading bits that are the same may be transmitted immediately, as they will not be affected by further narrowing. A third issue is that of precision. From the description of arithmetic coding it appears that the precision required grows without bound as the length of the ensemble grows. Witten et al. and Rubin address this issue [Witten et al. 1987; Rubin 1979]. Fixed precision registers may be used as long as underflow and overflow are detected and managed. The degree of compression achieved by an implementation of arithmetic coding is not exactly H, as implied by the concept of arithmetic coding. Both the use of a message terminator and the use of fixed-length arithmetic reduce coding effectiveness. However, it is clear that an end-of-message symbol will not have a significant effect on a large source ensemble. Witten et al. approximate the overhead due to the use of fixed precision at 10^-4 bits per source message, which is also negligible.

The arithmetic coding model for ensemble EXAMPLE is given in Figure 3.10. The final interval size is p(a)^2*p(b)^3*p(c)^4*p(d)^5*p(e)^6*p(f)^7*p(g)^8*p(space)^5. The number of bits needed to specify a value in the interval is -lg(1.44*10^-35)=115.7. So excluding overhead, arithmetic coding transmits EXAMPLE in 116 bits, one less bit than static Huffman coding.

Source    Probability   Cumulative    Range
message                 probability

   a            .05         .05       [0,.05)
   b            .075        .125      [.05,.125)
   c            .1          .225      [.125,.225)
   d            .125        .35       [.225,.35)
   e            .15         .5        [.35,.5)
   f            .175        .675      [.5,.675)
   g            .2          .875      [.675,.875)
   space        .125       1.0        [.875,1.0)

Figure 3.10 -- The Arithmetic coding model of EXAMPLE.

Witten et al. provide an implementation of arithmetic coding, written in C, which separates the model of the source from the coding process (where the coding process is defined by Equations 3.1 and 3.2) [Witten et al. 1987]. The model is in a separate program module and is consulted by the encoder and by the decoder at every step in the processing. The fact that the model can be separated so easily renders the classification static/adaptive irrelevant for this technique. Indeed, the fact that the coding method provides compression efficiency nearly equal to the entropy of the source under any model allows arithmetic coding to be coupled with any static or adaptive method for computing the probabilities (or frequencies) of the source messages. Witten et al. implement an adaptive model similar to the techniques described in Section 4. The performance of this implementation is discussed in Section 6.