ICS 46 Spring 2022, Exercise Set 5

Getting started

First of all, be sure that you've read the Reinforcement Exercises page, which explains everything you'll need to know, generally, about how the reinforcement exercises will work this quarter. Make sure you read that page before you continue with this one.

Before you begin work on these reinforcement exercises, there's a chore that you'll need to complete on your ICS 46 VM to get it set up to proceed.

Refreshing your ICS 46 VM environment

Even if you previously downloaded your ICS 46 VM, you will probably need to refresh its environment before proceeding with these exercises. Log into your VM and issue the command ics46 version to see what version of the ICS 46 environment you currently have stored on your VM. Note, in particular, the timestamp; if you see a version with a timestamp older than the one listed below, you'll need to refresh your environment by running the command ics46 refresh to download the latest one before you proceed with these exercises.

2022-05-03 06:52:06
Exercise Set 5 template added

If you're unable to get outgoing network access to work on the ICS 46 VM — something that afflicts a handful of students each quarter — then the ics46 refresh command won't work, but an alternative approach is to download the latest environment from the link below, then to upload the file on to your ICS 46 VM using SCP. (See the Project #0 write-up for more details on using SCP.) Once the file is on your VM, you can run the command ics46 refresh_local NAME_OF_ENVIRONMENT_FILE, replacing NAME_OF_ENVIRONMENT_FILE with the name of the file you uploaded; note that you'd need to be in the same directory where the file is when you run the command.

ics46-2022spring-environment.tar.gz

Creating your project directory on your ICS 46 VM

A project template has been created specifically for this set of exercises, containing a similar structure to the basic template you saw in Project #0. Among other things, it contains a version of the gather script that's different from the ones in the projects, so you'll absolutely need to use the set5 template for these exercises, as opposed to the basic one.

Decide on a name for your project directory, then issue the command ics46 start YOUR_CHOSEN_PROJECT_NAME set5 to create your new project directory using the set5 template. (For example, if you wanted to call your project directory set5, you would issue the command ics46 start set5 set5.)

Problem 1 (2 points)

We've now seen three different ways to implement efficient search structures: AVL Trees, Skip Lists, and Hash Tables. All three of these data structures solve fundamentally the same problem: organizing pieces of information based on a unique key used to find it later. Sometimes, they're basically interchangeable; it doesn't always matter which choice we make. Sometimes, one or more of them should be disqualified from consideration, because it doesn't meet a requirement that's met by at least one of the others.

In each of the scenarios listed below, briefly explain whether it matters which of these three data structures we use. If so, list which ones you've disqualified from consideration and briefly explain why.

You'll be storing information about students, keyed by a student ID. The most important operations will be looking up a student based on their ID and printing a list of students sorted by their last names.
You'll be storing calendar information for one person, keyed by the date (i.e., on each date, you'll keep track of what's on their schedule). The most important operation will be finding out what's on their schedule for the next n days (i.e., today and the next n - 1 days), where n might potentially be as much as 365.
You'll be storing information about the songs in a media collection, keyed by a combination of an artist and a title, which are both strings, and the combination of which are assumed to be unique. The most important operation will be starting with a song and finding other songs that are similar-sounding.

What to submit

Add one PDF file to your core directory with this name: problem1.pdf, which contains your answer to this question.

Problem 2 (2 points)

Suppose that you were going to store each of the following sets of keys in a hash table. For each of them, propose and briefly describe a hash function, based on our conversation about Hash Tables. Briefly explain why your function meets the characteristics of "good" hash functions. Not that we're not asking you to write your function in C++; an English description is fine (and preferred) here.

The keys are arbitrary and randomly-selected dates on the calendar between January 1, 2000 and December 31, 2099, which are made up of three non-negative integers: a year, a month, and a day.
The keys are vehicle identification numbers (VINs) for a hypothetical state in the United States. Suppose that each VIN is made up of 16 characters, assigned as follows:
- The first character is a digit indicating the decade in which the vehicle was built, with 0 meaning the years 2000-2009, 1 meaning the years 2010-2019, and so on. (You can assume the scheme "rolls over" in 2100, so that 0 would also denote the years 2100-2109, and so on.)
- The second and third characters are letters indicating which county — counties are smaller geographical areas within each state — the vehicle was originally registered in.
- The fourth through sixth characters are digits that are assigned at random.
- The seventh through tenth characters are letters that are assigned at random.
- The eleventh through sixteenth characters are assigned sequentially for each sequence of values in the first ten characters. For example, the first vehicle whose VIN begins with 2BD117XQZR would have a VIN ending in 000000; the next VIN that begins with 2BD117XQZR would end in 000001; and so on. This ensures that all VINs are unique.

Note that the capacity of the hash table is not a factor here; your hash function should generate hash values that are arbitrary non-negative integers, and it's assumed that the hash table would take those values and use the % operator to bring them within the range of indices in the table.

What to submit

Add one PDF file to your core directory with this name: problem2.pdf, which contains your two proposed hash functions.

Problem 3 (2 points)

In our discussion of Priority Queues, we talked about implementing a priority queue as a binary min heap. Suppose instead that we decided to implement one as a quaternary min heap instead. To consider that, we'll first need to define some new terms.

A quaternary tree is an N-ary tree of order 4, in which the four subtrees of each internal node are considered to have an "ordering" from left to right (i.e., the "first" subtree is on the far left, the "fourth" subtree is on the far right, the "second" is just to the right of the first, and the "third" is between the second and fourth).

A complete quaternary tree is a quaternary tree in which every level is completely filled (i.e., every internal node that can be on that level is on that level), except the bottom level might be missing some internal nodes; if it is, all of the internal nodes that are present will be as far to the left as possible.

A quaternary min heap is a complete quaternary tree of order 4 that is a min heap.

Now let's consider how a priority queue implementation based around a quaternary min heap might be different from the one we discussed, which used a binary min heap, by wrapping our minds around a few questions about it.

Suppose you wanted to skip the creation of node objects and the linking of those nodes with pointers by using a similar technique to the one we used in the binary min heap implementation: storing the elements in a std::vector. Could you do this such that the nodes were numbered in a zero-based fashion (i.e., numbering the root node 0)? If so, what would be the formula, given a node numbered i, for finding a node's parent and each of its four children? If not, what is it about a quaternary min heap that makes this impossible to do?
What is an asymptotic notation for the height of a quaternary min heap containing n elements? How is that different from the height of a binary min heap?
Which implementation — our new quaternary version, or the binary version we discussed in lecture — would you expect to have an asymptotically faster implementation of enqueue and dequeueMin? Why?
Which implementation would you expect to be faster in practice when performing an enqueue or dequeueMin? Why?

What to submit

Add one PDF file to your core directory with this name: problem3.pdf, which contains your answer to four questions.

Problem 4 (2 points)

We've seen two data structures that, when you draw diagrams of them, look fairly similar to one another: trees and directed acyclic graphs (DAGs). But it's worth considering just how similar they actually are. The more similar they are, the more our knowledge of one of them will inform our knowledge of the other; the more different they are, the more our knowledge of one of them will mislead us about the other instead.

Could you take any tree and represent it instead as a DAG, while maintaining the same relationships between the tree's nodes? If so, what would need to change to make that possible (i.e., what would you do to the tree to make it into a DAG)? If not, why not?
Could you take any DAG and represent it instead as a tree, while maintaining the same relationships between the DAG's vertices? If so, what would need to change to make that possible (i.e., what would you do to the DAG to make it into a tree)? If not, why not?

While, as usual, we require your answers to be typed rather than written by hand, you can feel free to include a diagram to supplement your answer in each part — drawn by hand and scanned or created on a computer — as well, though we'll still need a written answer; the diagram is a supplement to your answer, not the answer itself. And, of course, a diagram is certainly not required, if you can explain your answer clearly without one.

What to submit

Add one PDF file to your core directory with this name: problem4.pdf, which contains your answer to these two questions.

Problem 5 (2 points)

We saw during our discussion of the implementation of Graphs that there are two broad implementation strategies that we can choose if we're implementing a graph: adjacency lists or an adjacency matrix. The difference turns out not to be academic, nor is it a matter of convenience or familiarity; the distinction is a critical one that we need to get right.

For each of the scenarios described below, choose one of the implementation strategies or the other — either adjacency lists or an adjacency matrix — and briefly explain, in a sentence or two, why your choice is the right one.

Each vertex is intended to represent a web page somewhere on the World Wide Web. A directed edge leads from the vertex v₁ to the vertex v₂ whenever there is a link from v₁'s page leading to v₂'s.
Each vertex represents a city in the United States in which there is an airport. If it's possible to fly from one city to another, even if it requires multiple flights (e.g., flying from a small airport in Afton, Wyoming to Orange County, California might require one flight from the small airport in Afton, Wyoming to a larger nearby city like Denver, followed by a flight from Denver to Orange County), a directed edge contains the cost and other details of the cheapest sequence of flights from one city to the other.
Some vertices represent people and other vertices represent movies available for viewing on an online service. A directed edge leads from a vertex representing a person to a vertex representing a movie whenever that person has watched that movie, with the edge storing some information about that viewing, such as when it occurred or how the person rated the movie afterward.

What to submit

Add one PDF file to your core directory with this name: problem5.pdf, which contains your choices and explanations for these three scenarios.

Deliverables

In Canvas, you'll find a separate submission area for each problem. Submit your solution to each problem into the appropriate submission area. Be sure that you're submitting the correct file into the correct area (i.e., submitting your Problem 1 solution to the area for Problem 1, and so on). Under no circumstances will we offer credit for files submited in the incorrect area.

Submit each file as-is, without putting it into a .tar.gz file or arranging it in any other way. (There is a gather script in the set5 template, but there's no need to use it.) If we asked for a PDF, for example, all we want is a PDF; no more, no less. If you submit something other than what we asked for, we will not be offering you any credit on the submission. There are no exceptions to this rule.

Of course, you should also be aware that you're responsible for submitting precisely the version of your work that you want graded. We won't regrade an exercise simply because you submitted the wrong version accidentally, and we won't be able to offer any credit on exercises that you forgot to submit.

Can I submit after the deadline?

Unlike some of the projects in this course, the reinforcement exercises cannot be submitted after the deadline; there is no late policy for these. Each is worth only 3% of your grade, so it's not a disaster if you miss one of them along the way.

Can I work on this outside of the VM?

Yes, but be aware that you must submit the files in the appropriate format, and that any C++ code you submit must compile and run on the ICS 46 VM.