CompSci 261, Winter 2018, Homework 4

  1. Suppose that the streaming majority-winner algorithm is given as input a random ordering of the input sequence A,A,A,B,B,C. Each different ordering of this input is equally likely. What is the probability that the algorithm returns A as its result? (Note that it is allowed to write a computer program to solve this, although it should also be possible to solve by hand calculation.)

  2. Recall that a priority queue maintains a set of elements with associated priority values, subject to operations that insert an element, delete an element, or find the element with the minimum priority. You may also assume that a priority queue has two more operations that return its length and that return a list of its elements. Give pseudocode that uses these operations to implement a streaming algorithm for maintaining the MinHash sample of a data stream (the $k$ elements of the stream with minimum hash value for some predetermined hash value $h(x)$). Your pseudocode should define two functions, process(x) which handles a new element of the stream, and sample() which returns the current MinHash sample. Your answer should describe only these two functions; do not describe how to implement a priority queue.

  3. In a stream of distinct elements, consider the following streaming approximate-median algorithm: we maintain a random sample of three elements, and then estimate the median of the stream by returning the middle element of our sample. What is the probability that this estimate is within the middle third of the stream elements (that is, that it is greater than at least 1/3 of the stream and less than at least 1/3 of the stream)? (You may state the answer as a number rather than a formula, ignoring terms in the answer that go to zero as the stream length becomes large.)

  4. Describe a data structure that can handle a stream of insertion and deletion operations on a collection of numbers from $0$ to $N-1$ (allowing the same number to appear multiple times in the collection), and that after each operation reports a number $x$ such that, if there is a majority element, $x$ is that element. That is, if $k$ numbers have been inserted but not deleted, and one number appears more than $k/2$ times among them, then $x$ should be that number. For full credit, your structure should use at most $O(\log N)$ words of memory. You may assume that the stream does not include any deletion operations for numbers that have not been inserted. (Hint: Solve it for $N=2$ first, and then consider the binary representation of the numbers.)