CompSci 267 Homework set #4

  1. In LZSS (the version of LZ77 due to Storer and Szymanski), a short match can be represented by either (F,P,L) (flag,pointer,length) or by (F,C) (flag,character). If the window length is W=4096 and the maximum match length is M=256, what is the shortest match that one would represent as a match rather than as uncompressed characters?

  2. Analyze the LZW compression of the string "aaaa...", for input length 1 million.

  3. What is the longest string that can be retrieved from the LZW dictionary during decoding when the input text had length 1 billion?

  4. Assume a two-symbol alphabet with the symbols {a, b}. Show the first 15 dictionary entries for the LZW encoding of the string: ababababababab...

  5. In BWT, the last column, L, of the sorted matrix contains concentrations of identical characters, which is why L is easy to compress. However, the first column, F, of the same matrix is even easier to compress since it contains runs, not just concentrations. Why select column L and not column F?

  6. Using BWT for string S="sssssssssh" calculate string L and its MTF compression.