CompSci 261, Spring 2017, Homework 6

  1. In class, we described how to use a suffix tree of a large document $D$ to search for the positions in $D$ that match a query string $q$. Suppose we don't want to list the positions that match $q$, but instead we want to count how many matches to $q$ there are in $D$. Describe how to augment the suffix tree by storing a small amount of additional information at each node so that these counting queries can be answered quickly.

  2. Suppose that a binary search tree $T$ has been augmented so that every tree node has an extra "size" field that stores the number of nodes in its subtree (including the node itself), allowing us to solve rank and unrank queries in time $O(\log n)$. Describe how to compute the median of the keys in $T$ (the key that would be in the middle position in a sorted list of the keys) by using a constant number of rank or unrank queries.

  3. The lower bound that we showed in class proved that, if we maintain a dynamic array of $n$ machine words, and perform exactly $n$ updates and $n$ queries (where an update changes one array value and a query asks for the sum of the values in a prefix of an array) then the total time for this sequence of operations must be $\Omega(n\log n)$. What about when the number of operations is not equal to the length of the array? If we have an array of $s$ words, and we perform $t$ updates and queries, what is the lower bound for the number of operations we need? (Hint: consider separately the cases when $s\gt t$ and when $s\lt t$, and use a $\min$ operation to combine the bounds for these two cases. When $s\gt t$, how much of the array can we actually use? When $s\lt t$, apply the lower bound from class to subsequences of the operations.)

  4. Consider what happens when we apply fractional cascading to an input consisting of $i$ lists, each of which has $2^i$ sorted numbers in it (with each number appearing in only one of the lists). Write a formula in terms of $i$ for the sum of the lengths of the lists in the resulting fractionally cascaded data structure.