CompSci 261, Winter 2018, Homework 2

  1. In the deletion algorithm for linear probing, we need to know whether an index $i$ appears between $h(k)$ and $j$, where $h(k)$ is the starting index for key $k$ and all three of $i$, $h(k)$, and $j$ belong to the same block of occupied cells. Give pseudocode or a formula for performing this betweenness test, that correctly handles the case when the block of cells wraps around from the end of the hash table back to its start.

  2. Recall that, for a linear probing hash table with load factor at most $1/2$, the multiplicative Chernoff bound tells us that the probability of seeing a block of exactly $B$ filled cells in one particular location in the hash array is at most $(e/4)^{B/2}$. Use this formula to give a bound on the expected number of blocks of filled cells of this length throughout the entire hash array. What is the largest value of $B$ for which your bound on the expected number is at least $1$? (Hint: How many different blocks of length $B$ are there?)

  3. The version of cuckoo hashing that we described in class uses two hash functions $h_1(k)$ and $h_2(k)$ that both map a key $k$ to a position within a single array $A$ of key/value pairs. Sometimes cuckoo hashing is instead implemented with two different arrays $A_1$ and $A_2$, so that $h_1(k)$ maps key $k$ to a position in array $A_1$ and $h_2(k)$ maps key $k$ to a position in array $A_2$. Recall that one way for cuckoo hashing to fail is for three keys to all map to the same two array cells.

    Compute the probability that a given triple of keys all map to the same two array cells, for both kinds of cuckoo hash table, when there are $n$ keys and a total of $2n$ array cells. (For the version of cuckoo hashing with two tables, you can assume that each of the two arrays has $n$ cells.) For which of these two versions of cuckoo hashing is there a greater probability of this type of failure, or are they both equal in this respect?

  4. Recall that the load factor of a hash table is defined as $n/N$, where $n$ is the number of key-value pairs stored in the table and $N$ is the number of cells in the hash array. If a single reference to an object can be stored in a single machine word, then a single key-value pair takes two machine words to store, so the minimum amount of memory required for any structure that lists all pairs would be $2n$, while an open-addressing hash table would use $2N$ machine words. Thus, the load factor also equals $2n/2N$, the ratio between the minimum amount of memory needed and the amount of memory actually used.

    Now, consider a hash chaining data structure, where we store the key-value pairs within any cell of the hash table as a linked list (a collection of objects that each have three references or pointers, for the key, value, and next object in the list) and where each table cell stores a single reference or pointer to the first object in its list. If this structure uses $M$ memory cells, define the "effective load factor" of this structure to be $2n/M$, where $2n$ is again the minimum storage required for $n$ key-value pairs and $M$ is the storage actually used.

    Write a formula that translates the load factor $f$ of a hash chaining data structure to the effective load factor of the same structure. What is the value of $f$ for which the effective load factor is $1/2$?