CompSci 261, Spring 2017, Homework 2

  1. For a hash table that uses hash chaining with $n$ keys, $N$ table cells, and load factor $\alpha=n/N$, prove that the probability that key $k$ belongs to a chain of size exactly $s$ is at most $\alpha^{s-1}/(s-1)!$.

    (Hint: How many ways are there for $s-1$ other elements to be placed into the same chain and what is the probability that any one of these ways happens?)

  2. The simplest way that a cuckoo hash table could fail to insert all keys would be for three of its keys to have the same two hash table cells as their hash values.

    1. For a cuckoo hash table with $n$ keys, two tables of $N/2$ table cells each, and load factor $\alpha=n/N$, what is the expected number of triples of keys for which this happens? You can calculate this expected number as the total number of triples of keys, multiplied by the probability that any given triple has this problem.

    2. Use Markov's inequality (and the observation that when the number of bad triples is less than one, it can only be zero) to prove that the probability of getting this kind of failure is O(1/n) when $\alpha$ is a constant less than $1/2$.

  3. Give pseudocode (or, if you prefer, Python code) for an algorithm that uses a dictionary to process a list of items and returns as output a list of the items that appear only once in the input. For example, when your algorithm is run with the list [2,7,1,8,2,8] as its argument, it should return the list [1,7] or [7,1] (the output ordering is unimportant). Your algorithm should not sort the input (because sorting is too slow) and when the input length is $n$ it should use $O(n)$ dictionary operations. The dictionary operations you can use include creating an empty dictionary, testing whether a key is present in the dictionary, setting the value associated with a key (or adding the key if it is not already present), looking up the value associated with a key, and listing all key-value pairs.

  4. Suppose that you are choosing a hashing algorithm to use as part of a web server, and you want to prevent denial-of-service attacks in which an attacker, knowing which algorithm you use and which keys are in your hash table, sends requests that try to make your server use as much processing time as possible. You may assume that these requests can only cause your server to perform hash table lookups, but not to change the contents of the hash table. Which of linear probing, cuckoo hashing, or hash chaining would be the most secure against this type of attack? Explain your answer.