CS 261, Spring 2013, Homework 3 Due Thursday, May 2 1. Suppose you have two Bloom filters FA and FB (each having the same number of cells and the same hash functions) representing the two sets A and B. Let FC = FA & FB be the Bloom filter formed by computing the bitwise Boolean and of FA and FB. (a) FC may not always be the same as the Bloom filter that would be constructed by adding the elements of the set (A intersect B) one at a time. Explain why not. (b) Does FC correctly represent the set (A intersect B), in the sense that it gives a positive answer for membership queries of all elements in this set? Explain why or why not. 2. Suppose that we want to store a set S of n = 20 elements, drawn from a universe of U = 10000 possible keys, in a Bloom filter of exactly N = 100 cells, and that we care only about the accuracy of the Bloom filter and not its speed. For this problem size, what is the best choice of the number of cells per key (the parameter k in the lecture)? (I.e., what value of k gives the smallest possible probability that a key not in S is a false positive?) What is the probability of a false positive for this choice of k? 3. Recall that the MinHash sketch stores the k elements of a set that have the smallest hash values (for a parameter k that determines the accuracy of the sketch, and for a fixed hash function that is used for all such sketches). Suppose that we wish to compute the MinHash sketch for a set S of n elements, where n is so big that S does not fit into main memory, and can only be accessed as a data stream. Design an algorithm for computing the sketch using only an amount of storage sufficient to hold O(k) set elements or their hash values, in a single pass over the data, in total time O(n). You may assume that each evaluation of the hash function takes time O(1). You may also find it helpful to recall that the xth largest element of a set of y values can be computed in time O(y) (this is called "linear time selection"). You should not assume that k is a constant that can be omitted from O-notation.