CompSci 261, Spring 2017, Homework 6

  1. In class, we described how to use a suffix tree of a large document $D$ to search for the positions in $D$ that match a query string $q$. Suppose we don't want to list the positions that match $q$, but instead we want to count how many matches to $q$ there are in $D$. Describe how to augment the suffix tree by storing a small amount of additional information at each node so that these counting queries can be answered quickly.

    Solution: Store at each node the number of leaves under the node. The answer to a query is found by doing a search as usual, and if the search ends without mismatches at a node, returning the number stored there. If the search ends in the middle of an edge, return the number at the child endpoint of the edge.
  2. Suppose that a binary search tree $T$ has been augmented so that every tree node has an extra "size" field that stores the number of nodes in its subtree (including the node itself), allowing us to solve rank and unrank queries in time $O(\log n)$. Describe how to compute the median of the keys in $T$ (the key that would be in the middle position in a sorted list of the keys) by using a constant number of rank or unrank queries.

    Solution: If you know $n$, the number of keys in the list (a reasonable enough assumption), it is unrank(floor($n/2$)). If you don't already know $n$, you can compute it as $n={}$rank($+\infty$).
  3. The lower bound that we showed in class proved that, if we maintain a dynamic array of $n$ machine words, and perform exactly $n$ updates and $n$ queries (where an update changes one array value and a query asks for the sum of the values in a prefix of an array) then the total time for this sequence of operations must be $\Omega(n\log n)$. What about when the number of operations is not equal to the length of the array? If we have an array of $s$ words, and we perform $t$ updates and queries, what is the lower bound for the number of operations we need? (Hint: consider separately the cases when $s\gt t$ and when $s\lt t$, and use a $\min$ operation to combine the bounds for these two cases. When $s\gt t$, how much of the array can we actually use? When $s\lt t$, apply the lower bound from class to subsequences of the operations.)

    Solution: $\Omega(t\log\min(s,t))$.
  4. Consider what happens when we apply fractional cascading to an input consisting of $i$ lists, each of which has $2^i$ sorted numbers in it (with each number appearing in only one of the lists). Write a formula in terms of $i$ for the sum of the lengths of the lists in the resulting fractionally cascaded data structure.

    Solution: $(i-1)2^{i+1}+2$. Probably there's a good way to derive this from first principles, but I found the answer by writing a short program to calculate these numbers and then guessing a formula that works. Once you know the formula it's not difficult to prove by induction.