Program 3

Implementing Priority Queues and Maps
(and their iterators) with Binary Trees

ICS-46: Data Strcuture Implementation and Analysis


Introduction This programming assignment is designed to ensure that you know how to implement two templated classes (Priority Queue and Map) with binary trees (max-heap and Binary Search Tree respectively). Your implementations will also include fully-functional iterators for these classes. For the priority queue you will be writing code that processes the underlying array storing the values as a max-heap. For the map you will be writing iterative and recursive code that processes the BST.

You can test these implementations by using the standard drivers (provided with the download) and GoogleTests that we will use when grading your code for correctness; recall that you can augment the GoogleTest with whatever code you want, to aid your debugging: a GoogleTest is just a C++ program. You can also test the code you wrote for Programming Assignment #1 (which used array implementations of these classes) by substituting these binary tree implementations -typically by changing a few typedef statements. In fact, you will be required to translate my solution to the wordgenerator program to use HeapPriorityQueue and BSTMap and indicate the performance improvement (over the original priority queue and map classes) on a large text file (wghuck.txt).

Write and use the standard insertion (<<) operator and str() method in each class for debugging. Note that there are few tested requirements for what these operators/methods return, but this code will make debugging easier, and we may examine it by hand.

For str in HeapPriorityQueue return in a std::string the entire contents array for all indexes (ascending) along with the other instance variables: from this information you can easily reconstruct the max-heap-as-binary_tree. After enqueueing b, a, c, the HeapPriorityQueue's str() prints as heap_priority_queue[0:a,1:b,2:c,3:](length=4,used=3,mod_count=4).

For str in BSTMap return a std::string that shows the BST rotated left by 90 degrees along with the other instance variables, so you can easily examine the structure of the BST. After assocating b with 2, a with 1, and c with 3, the BSTMap's str() prints as

bst_map[
..c->3
b->2
..a->1
](used=3,mod_count=3)

You should download the program3 project folder and use it to create an CLion project. You will write the required methods in the heap_priority_queue.hpp and bst_map.hpp files in this project, and submit each separately in Checkmate. The project folder also contains two pairs of .hpp and .cpp files: a driver/GoogleTest pair for each class that you will write, and the driver.cpp file which has a main function that can be made to run any of the two drivers.

Important: Only one of the .cpp files with a main method can be active/compiling at any time. In the download, only the driver.cpp file is active; the GoogleTests are inactive. To make a progam inactive, select it (in the editor tab), use the Ctrl-a command to select all its lines, and then type Ctrl-/ (or click Source at the top left of the menu and choose Toggle Comment): ever line will now appear in a comment (so the main function is commented-out); by using these same instructions, you can toggle back those lines to not have comments.

I recommend that you work on this assignment in pairs. Try to find someone who lives near you, with similar programming skills, and work habits/schedule: e.g., talk about whether you prefer to work mornings, nights, or weekends; what kind of commitment you will make to submit program early.

Only ONE STUDENT should submit the assignment (all parts of it). If students work in pairs, BOTH NAMES and their UCInetID names must appear in a comment at the top of each submitted program. For example if Romeo Montague (whose UCInetID is romeo1) submitted a program that he worked on with his partner Juliet Capulet (whose UCInetID is jcapulet) the comment at the top of each submitted file would appear as:

  // Submitter: romeo1(Montague, Romeo)
  // Partner  : jcapulet(Capulet, Juliet)
  // We certify that we worked cooperatively on this programming
  //   assignment, according to the rules for pair programming
If you do not know what the terms cooperatively and/or rules for pair programming mean, please read about Pair Programming before starting this assignment. If the names do not appear at the top of all your submissions in exactly this form, points will be deducted. If you are submitting by yourself, you may omit all lines but the first (Submitter). Please do turn in each program as you finish it, so that I can accurately assess the progress of the class as a whole during this assignment; do not turn in all the programs at the same time.

Print this document and carefully read it, marking any parts that contain important detailed information that you find (for review before you turn in the files).

This assignment has 3 parts: pairs should work on each part together, not split them up and do them separately. Part 1 is worth 30 points; part 2 is worth 25 points; part 3 is worth 5 pts. Remember I'm going to be running MOSS on the parts of this assignment.

Important: The courselib contains array implementations for all these data types (and you have written linked-list versions of them); although this assignment requires you to use binary trees, there are still many strong similarities at a high level in all these implementations. So, I encourage you to examine these implementations closely, and understand them; possibly, experiment with them (using their drivers or GoogleTests), while you are writing your binary tree implementations: this advice is especially true as you begin to study, understand, and implement iterators. Please feel free about asking questions about these methods: both their syntax and semantics.


Priority Queues (30 pts) Priority Queues can be implemented by a variety of data structures (where the highest priority value is always removed first). How does a specific priority queue determine which value has the highest priority? We supply the priority queue with a gt (greater-than) function that computes whether or not its first argument has a greater priority than its second argument. This gt can be passed to any constructor or instantiated in the template. So, we cannot ask, "What is the priority of a value." But, we can ask "Does the priority of a first value have a higher priority than a second value", by calling the gt function. For example, we cannot determine the priority of a std::string value, but we can determine whether one std::string value has a higher priority than another std::string value.

We can implement priority queues efficiently using a max-heap (stored in an array), whose enqueue and dequeue operations are each O(Log N). Enqueuing/Dequeuing N values is O(N Log N).

The file heap_priority_queue.hpp declares the appropriate constructors, methods, operators, and instance variables. Note the simple helper methods left_child, right_child, parent, is_root, and in_heap, which abstract the mapping functions and two useful bool functions. Use these methods to simplify writing the helper methods percolate_up and percolate_down (int i), which do most of the work in the enqueue and dequeue methods (respectively) and may be called in other places as well (e.g., the iterator's erase method). Think carefully about how to write these percolate methods simply (or you will never get them properly debugged): my method bodies used for loops and were 2 lines and 9 lines long each (each using std::swap included from <utility>).

I suggest copying/pasting the methods from the array_priority_queue.hpp file, and then translating these methods from using an array to using a max-heap, since both fundamentally process arrays (e.g., see/call the ensure_length helper method). Pay close attention to ensure all instance variables receive values in the constructors and are used/set correctly in queries and commands.

Special Requirements:

  • copy, initializer_list and iterable constructors, and operator=: These operations can all execute in O(N); they should not work by enqueueing each value into an empty max-heap, which would be O(N Log N). If there are two gt functions and they are the same, then directly copy the array. If there is only one gt function or the two are different, then first copy the array, and then make it into a max-heap by using the linear "heapify" algorithm discussed in the notes.

  • Note that the two priority queues in the copy constructor and operator= each store their own gt function, which may be the same or different. The one priority queue in the initializer_list and iterable constructors stores only its gt function; in these cases copy the information (by iterating through the parameter) into an array first and then always heapify it.

  • For operator=, copy the gt function on the right-hand side priority queue into the gt function for the (target) left-hand side priority queue (because after assignment, these priority queues must be ==; the gts may already be the same).

Iterators: There is no simple/efficient way to write an iterator for this max-heap implementation of priority queues, because the iterator is supposed to produce its values from highest to lowest priority, but the array is a max-heap (so we cannot just traverse its values); being able to erase values with an iterator makes this task even harder. Here is how to implement the iterator for this class.

  • Notice the ics::HeapPriorityQueue it; instance variable in the Iterator class in the HeapPriorityQueue class.

  • Call the first constructor for Iterator (the one that specifies a last bool parameter) when called from begin. It doesn't matter what value is passed: the point is to call the first Iterator constructor, not the second. The initializer for this Iterator constructor should copy the ref_pq HeapPriorityQueue into the instance variable it, and use this copy for iteration purposes. Examining the the current iterator value calls peek() on it; advancing the iterator calls dequeue it Overall, the "cursor" is the highest priority value in the it priority queue (the one that can be peeked and dequeued).

  • Call the second constructor for Iterator (the one that does not specify a last bool parameter) when called from end. The initializer for this Iterator constructor should leave it empty).

  • Two iterators that point to the same ref_pq are equal (index the same value) if their sizes are equal.

  • To erase the iterator's "cursor" use the following algorithm:
    1. Dequeue a value.
    2. Scan the max-heap in the real (ref_pq) priority queue to find the index of the value to erase.
    3. Use that index like the root in the dequeue algorithm with one difference: the value put there from the end of the max-heap might have to be percolated up instead of down. Look at the iterator_erase_heap_special_case GoogleTest to see an example of a max-heap (construct its array), where removing a value ("f") causes the value to percolate up.
This approach makes the iterators easy to write but expensive in time and space to use (especially erase): e.g., storing the second heap requires O(N) extra space; creating the iterator requires O(N); ++ is O(Log N) not O(1): erase can be O(N) since it performs a linear scan of the array representing a heap.

Feel free to question/discuss iterators for HeapPriorityQueue on the message boards, so long as no code is posted or described in too much detail.


Maps (25 pts) Maps can be implemented by a variety of data structures: they all associate unique keys with values (possibly duplicated), by storing an ics::pair holding both the key and its value. The data structure should allow us quickly to find the value associated with a key, and possibly change/update such a value. We must supply BSTMap with a lt (less-than) function that computes whether its first argument key belongs in the left subtree (if not, and the two values are !=, it belongs in the right subtree). This lt can be passed to any constructor or instantiated in the template.

We can implement maps efficiently by using a binary search tree (BST) whose order property is determined by comparing only the key (first) part of the pair. Most operations updating BSTs are O(height of BST) which for non-pathological BSTs is O(Log N).

The file bst_map.hpp declares the appropriate constructors, methods, operators, and instance variables. Note the declaration of many helper methods that operate on BSTs: most (in fact, all but find_key) are more simply implemented recursively. These recursive methods typically are passed the root of a BST (using some parameter mode), and then call themselves recursively on a left and/or right subtree (passing their left/right child as the root of a smaller tree).

Many of the standard map methods call one or more of these helper methods to retrieve information from a BST or change it (e.g., add/remove key->value assocations). These include the functions find_key, has_value, copy, copy_to_queue (a helper for creating iterators), equals (a helper for operator == ), and string_rotated (a helper for str(); and the mutators insert (a helpder for put), find_addempty (a helper for []), remove_closest (a helper for remove), remove (a helper for erase), and delete_BST.

I suggest examining code in the array_map.hpp file to help you understand some of the bookkeeping requirements of these methods. Many of the recursive helper functions you need to write have similar functions in the notes on processing binary trees (including BSTs) recursively. Pay close attention to ensure all instance variables receive values in the constructors and are used/set correctly in queries and commands.

Special Requirements:

  • copy and operator=: These operations can sometimes execute in O(N); they should not always work by putting each key/value pair into an empty BST, which would likely be O(N Log N), but in a pathological case would be O(N^2).

  • If the lt functions for the two maps are the same, then directly copy the entire BST in O(N): it is already "a BST with the correct order property". But, if the lt functions for the two maps are different, then put all the key/value pairs from one into the other (normally O(N Log N), but pathologically O(N^2)).

  • operator=: don't worry about reusing TNs in the lhs BST; just deallocate all them and copy the tree of the right-hand side.

Iterators: Note that map iterators produce Entry which is defined by typedef ics::pair<KEY,T> Entry; There are moderately simple/efficient ways to write iterators for a BST implementation of maps, but only if we don't allow iterators to erase any key/value. Because we are allowing erasure, we will adopt a simple but time/space-inefficient iterator for this class: it has some similarities implementing iterators for the heap priority queue.

  • Notice the ics::ArrayQueue it; instance variable in the Iterator class in the BSTMap class.

  • In the constructor for Iterator, the second argument should be true when called from begin but false when called from end.

  • When this parameter is true the Iterator constructor should copy all key->value pairs in the BST storing the map into the it queue; it then uses this queue for iteration purposes (when false it should leave it empty): examining the current iterator value calls peek() on it; advancing the iterator calls dequeue on it. Overall, the "cursor" is the front key->value pair in the it queue (the one that can be peeked and dequeued).

  • Two iterators that point to the same ref_map are equal (index the same value) if their queue sizes are equal.
  • To erase a the iterator's "cursor" erase its key->value pair from the BST storing the map.
This approach makes the iterators easy to write but expensive in time to create (copying all the key->value pairs from the BST to the queue is O(N)) and space (storing each key->pair value in the BST and in the queue requires O(N) extra space).

Feel free to question/discuss iterators for BSTMap on the message boards, so long as no code is posted or described in too much detail.


wordgenerator (5 pts) The download contains the file wordgenerator.cpp: my solution to the last part of Programming Assignment #1 (which also times itself running for building the corpus and creating a priority queue of its values in order). It also contains wghuck.txt: the text, appropriately formated, of Mark Twain's book, "The Adventures of Huckleberry Finn". For this part of the assignment, you must alter the #includes and any typedefs and code in this program, so that it uses the improved data type implementations (HeapPriorityQueue and BSTMap) instead of the "high-complexity" (inefficient) simple array versions of these data types.

In the comment at the bottom of this program, fill in the running times produced by the program (see the output line starting read_corpus time = and print_corpus (sort not print) time =) using both the array version and this improved version. Use an order statistic of 2.


Testing There are various ways to test each of the classes you are writing in this programming assignment. First, though, you should write all the methods, paying careful attention to the array implementations and previously written linked list implementations. For some, you might just boiler-plate simple code that is not correct, but allows the methods to compile, allowing other methods in the classes to be tested. For example (from the HeapPriorityQueue class)
template<class T, bool (*tgt)(const T& a, const T& b)>
bool HeapPriorityQueue<T,tgt>::empty() const {return false; //On this line to indicate boiler-plate
  return used == 0;
}

template<class T, bool (*tgt)(const T& a, const T& b)>
T& HeapPriorityQueue<T,tgt>::peek () const {return pq[0];   //On this line to indicate boiler-plate
}
Also, a constructor or void method will compile correctly with no statements in its body. Finally, don't forget to write actual code for the code you've boiler-plated.

The easiest way to start testing//debugging is by using the driver program. It allows you to perform any method call supported by the templated classes, and see the state of the class (or view the debugger). Start with with the default constructor and the insertion (<<) operator and str() method, so you can use them to debug the other methods; then test whatever command puts data into the data structure; then the queries; then the remaining commands; then the operators; then the iterators; then finally the remaining constructors.

After you test and debug your code with the driver, try running the appropriate GoogleTest code. Again, this form of testing is useful only as you approach a finished solution. We will use the GoogleTest, and visual inspection, to grade this assignment. Important Note: While you are debugging your classes, you can edit the GoogleTest (for example, putting std::cout statements at strategic points), but don't accidentally remove any of the assertions, otherwise you won't be fully checking your code the way we will. All debugging std::cout should end in std::endl to flush the output stream: ensure the output it displayed before executing the next statement (which may throw an exception, and thus the output buffer may not be flushed).

When you run the GoogleTest, initially choose small values for the first and third prompts (just press return to the second prompt) or comment-out these prompts and assign small values to these variables instead. Besides an indication of which tests pass and fail, the console window will show a speed for the speed test (which will vary depending on how fast a machine you run your code on): don't worry about it. When your code is passing all the tests, put in values like 10,000 for these prompts.