CompSci 261, Winter 2018, Homework 6

  1. Suppose that we want to solve dynamic range closest pair problems. We will maintain a data structure for a set of numbers, where the operations are to insert a number into the set, delete a number from the set, or find the closest pair of numbers whose values are in the range $[L,R]$.

    Describe how to solve this problem in logarithmic time per query using two pieces of extra information at each node $x$ of a binary search tree: The identity of the successor to $x$ (the node $y$ that comes immediately after $x$ in the sorted order of the current set), and the identity of the node $z$ that is a descendant of $x$ (possibly equal to $x$ itself) and, among descendants of $x$, has the closest successor. How can we use this information to answer each query efficiently?

    Solution: We saw in class how to represent the subset of data values within a range $[L,R]$ as the union of $O(\log n)$ subtrees of a binary search tree (contained entirely within the range) and $O(\log n)$ individual binary search tree nodes (along the paths to $L$ and $R$ in the binary search tree). For this problem, we want to construct the same representation of the range $[L,R']$ where $R'$ is the second-largest value of the range (so we leave out one value, the largest one). This can be done by first performing a binary search to find the largest data value within the query range, modifying the range to go up to but not include that value, and then representing the modified range in the same way we saw in class. With this modification, the closest pair within the range is the closest among the pairs of $z$ and its successof for roots of entire subtrees within the range, or the pairs of $x$ and its successor for the individual nodes within the range that lie on the search paths for the two range endpoints.

  2. If we modify the data structure of problem 1 by inserting a new number and a new node for that number, and then rebalancing the tree by performing some rotations, describe how to update the added information at the nodes of the tree. Which nodes might need to have their successor pointers changed, and what should their new successors be? Which nodes need to have their descendant-with-closest-successor pointers changed, and what should the new values of these pointers be?

    Solution: Let the new number be $x$. Search for its successor and store that information at the node for $x$. Search for its predecessor and store $x$ as the successor of the predecessor. And at each ancestor $a$ of $x$, its predecessor, or any of the other nodes modified by the insertion ($O(\log n)$ nodes in all), recompute the descendant $z$ of $a$ with the nearest successor by comparing three candidates for $z$: the two descendants-with-nearest-successors at the two children of $a$, and $a$ itself.

  3. Suppose that we wish to find shortest paths from a single starting vertex in graphs with $n$ vertices, $2n$ edges, and edge lengths that are integers in the range from $1$ to $\log_2 n$.

    1. What is the running time of Dijkstra's algorithm on these graphs using a Fibonacci heap for its priority queue?

      Solution: $O(n\log n)$

    2. What is the running time of Dijkstra's algorithm on these graphs using a bucket queue for its priority queue?

      Solution: $O(n\log n)$! The priorities are numbers in the range from $0$ to $n\log n$, but the order in which Dijkstra's algorithm finds and removes elements causes their priorities to be monotonically increasing. The time for a bucket queue find-and-remove is proportional to the difference between consecutive priorities, and these differences add in a telescoping sequence to $n\log n$. The time for a bucket queue decrease-priority operation is constant, so the total time is $O(n\log n)$.

      Alternatively, you could have found a time bound of $O(n+m+dc)$ for Dijkstra using bucket queues by following the link to the Wikipedia bucket queue article; plugging in the problem parameters $m=2n$ and $c=\log n$, noting that every graph has $d\le n-1$, and simplifying the constant factors out of the $O$-notation gives $O(n\log n)$.

    3. If you computed the worse answer $O(n^2\log n)$ using the fact that the largest priority that can be in the priority queue is $O(n\log n)$, and that an (un-optimized) bucket queue takes time proportional to its largest priority, then you should still get full credit.

    4. What is the running time of Dijkstra's algorithm on these graphs using a van Emde Boas tree for its priority queue?

      Solution: The expected answer (worth full credit) is: The largest priority that can be in the priority queue is $O(n\log n)$, and the time per operation in a van Emde Boas tree is the double logarithm of the maximum priority, so the total time becomes $O(n\log\log n)$.

      However, it's possible to modify the van Emde Boas tree to do even better. The key idea is that, at any step of the algorithm, the priorities in the vEB tree are all at most $\log n$ apart from each other. If you rebuild the vEB tree from scratch whenever its largest priority grows to $2\log n$, each vertex will only ever be involved in one rebuild operation, so the total number of operations in the vEB trees (including the total number of vertices in all rebuild operations) will still be $O(n)$. But then we only need to prioritize numbers up to $O(\log n)$, and the time per operation is the double logarithm of this. The total time becomes $O(n\log\log\log n)$.

    5. Which of these running times is best?

      Solution: The van Emde Boas tree solution, either with the answer I expected or with the better modified version, is better than all the other choices listed here.