CS 261, Spring 2023, Practice Problem Set 2

Suppose that we use hash chaining for a hash table with \(n\) keys, but with a hash table that is too small, so that the load factor \(\alpha\) is \(3\log n\). (This will cause operations on the hash table to take logarithmic time rather than constant time.) Prove that, for a random hash function and for this load factor, there is only \(o(1)\) probability of a hash chain existing with size at least \(6\log n\).

Hint: Use the Chernoff bound for the probability of having at least \(6\log n\) elements in a single cell. Then, use the union bound to combine the Chernoff bounds from all the cells.
Suppose that we want a very small 2-independent hash function, one that maps three different inputs \(x\), \(y\), and \(z\) to two outputs \(0\) or \(1\).
1. One possible choice for a 2-independent family \(H\) would be the family of all functions from these three inputs to these two outputs. How many functions are in \(H\), for this choice?
2. Find a different choice of \(H\) that is also 2-independent, but has fewer functions than the family of all functions. Express your choice as a table, where the rows are the functions in \(H\) and each column lists the values of each function for one of the three inputs. How many functions are in your family?
In Python, there is a function hash() built into the language that can map most types of object to an integer. For an integer \(x\), the value of hash(\(x\)) is just \(x\) mod \(2^{31}-1\), so for all integers smaller than \(2^{31}-1\), the hash of \(x\) is just \(x\) itself. Other types of objects have less-predictable hash values. The hash value, computed in this way, is used as the hash function for dictionaries by taking the result modulo the dictionary size. Python uses open addressing, but not linear probing, for its dictionaries.
1. If you know or can predict the size \(s\) of a hash table, describe how to choose \(s/2\) numbers that will all collide with each other.
2. What is the average time per operation to insert all these numbers into a linear-probing hash table, with this hash function? (Calculate the total time, using \(O\)-notation, and then divide by the number of insertions. Do not use the analysis of linear probing with random hash functions. You can assume that the size of the table is \(s\) throughout the sequence of insertions.)
Starting from the cuckoo hash table shown on the lecture note slide ``Graph view of cuckoo state'', describe the sequence of operations that would be performed if we insert a new key \(K\) with value \(5\), \(h_1(K)=3\), and \(h_2(K)=1\), and show what the graph view of the hash table looks like after the insertion is complete. Which of the cases listed in the slide ``Graph view of key insertion'' describes this insertion?