Program 5

Implementing Graphs with Maps and Sets
(and Dijkstras Shortest Path Algorithm)

ICS-46: Data Strcuture Implementation and Analysis


Introduction This programming assignment is designed to ensure that you understand the Graph data type, know how to implement it with (Maps and Sets) and practice using it to implement an extended version of Dijkstra's shortest-path algorithm.

You are already familiar with templeted classes implementing the Map and Set data types; in this assignment you will implement HashGraph using your implementations of HashOpenMap and HashOpenSet (or my solutions from Programming Assignment #4). Included for the first time are methods that store a graph's information into a text file, and load (build) a graph from a text file in this format. This assignment, unlike the previous three assignments requires implementing no new complicated data structure and no new iterator: like the Equivalence class on Quiz #7, you will write it using fast data structure implementations of the same standard data types that we have been using since Programming Assignment #1.

You can test the HashGraph implementation by using a driver for Graphs (provided with the download) and GoogleTest that we will use when grading your code for correctness; recall that you can augment the GoogleTest with whatever code you want, to aid your debugging: a GoogleTest is just a C++ program. You can also test the HashGraph implementation when writing/debugging the extended Dijkstras algorithm.

Look at the specifications below, but also look at the comments in the HashGraph class provided, which you will fill in with code. It includes a LocalInfo class that I have completely implemented, including a << method for displaying LocalInfo objects. It is an excellent idea to study this class to better understand it, and how to use it when implementing your HashGraph class. You will use it to write a << method for graphs; an example of its output is shown below (for the graph specified in the standard.txt file): note that the node maps in a graph are printed in alphabetical order, but the sets on each line have no ordering.

  graph g = graph[
    a -> LocalInfo[
           out_nodes = set[c,b,d]
           out_edges = set[->c(13),->b(12),->d(14)]
           in_nodes  = set[d]
           in_edges  = set[d(41)->]]
    b -> LocalInfo[
           out_nodes = set[d]
           out_edges = set[->d(24)]
           in_nodes  = set[a]
           in_edges  = set[a(12)->]]
    c -> LocalInfo[
           out_nodes = set[d]
           out_edges = set[->d(34)]
           in_nodes  = set[a]
           in_edges  = set[a(13)->]]
    d -> LocalInfo[
           out_nodes = set[a]
           out_edges = set[->a(41)]
           in_nodes  = set[b,c,a]
           in_edges  = set[b(24)->,a(14)->,c(34)->]]
    e -> LocalInfo[
           out_nodes = set[]
           out_edges = set[]
           in_nodes  = set[]
           in_edges  = set[]]
  ]

You should download the program5 project folder and use it to create an Clion project (ultimately needing to connect it to both the courselib and googletest libraries). You will write the required methods in the hash_graph.hpp, heap_adujstable_hash_priority_queue.hpp, dijkstra.hpp, and dijkstra.cpp files in this project, and submit each separately in Checkmate. The project folder will also eventually contain my solution files hash_map.hpp and hash_set.hpp, after Programming Assignment #4 is due; it also contains my solution to the heap_priority_queue.hpp, but that information is duplicated and extended in the heap_adujstable_hash_priority_queue.hpp file. The project folder also contains the driver.cpp (and driver_graph.cpp and driver_heap_adjustable_hash_priority_queue.cpp) and the GoogleTest files test_graph.cpp and test_heap_adjustable_hash_priority_queue.cpp. Finally, the file standard.txt is the textual representation of a graph used in the GoogleTest (and which you can use with the driver as well); the files flightdist.txt, and flightcost.txt will be used in the Dijkstra part of the assignment. All are stored in the input files folder.

The HashGraph part of the assignment is worth 30 points; the HeapAdjustableHashPriorityQueue part is worth 5 points (but must be written to write Dijkstra correctly) and Dijkstra part is worth 25 points.

Important: Only one of the .cpp files with a main method can be active/compiling at any time. In the download, only the driver.cpp and its driver_graph.cpp file is active; the drivers for the priority queue and GoogleTests are inactive. To make a progam inactive, select it (in the editor tab), use the Ctrl-a command to select all its lines, and then type Ctrl-/ (or click Source at the top left of the menu and choose Toggle Comment): ever line will now appear in a comment (so the main function is commented-out); by using these same instructions, you can toggle back those lines to not have comments.

This final assignment is to be done individually. Each student should submit all parts of the the assignment. Remember I'm going to be running MOSS on the parts of this assignment.

Print this document and carefully read it, marking any parts that contain important detailed information that you find (for review before you turn in the files).


The HashGraph Class The HashGraph template class (templated by class T) acts as a repository for all the information (nodes and edges) that comprise a graph. It is organized in such a way (with redundancy) that allows us quickly to access information a variety of information that is useful in graph algorithms. Part of its speed (and complexity) comes from storing (and updating) the redundant information.

We will represent each graph primarily using two maps: one with nodes as keys and one with edges as keys (each edge is a pair of std::string node names) whose associated value's type is specified in the template. With this representation, it is easy to find the value associated with an edge, find all the nodes and edges in the graph, as well as find all the nodes and edges directly connected (into and out of) some specific node

  • node_values maps its key (a node name represented by a std::string) to its associated LocalInfo object, which stores four Sets:
    1. nodes with edges that lead to them from this node (destination nodes with this node is their origin)
    2. nodes whose edges lead to this node (origin nodes with this node as their destination),
    3. edges leading to other nodes from this node, (all edges with this node is their origin node)
    4. edges from other nodes leading to this node (all edges with this node is their destination node)

  • edge_values maps its key (a pair of std::string, whose first is the origin node and whose second is the destination node).
Note that the keys to these maps are std::string and an ics::pair of std::string: see the hash_str and hash_pair_str hashing functions, which are defined as static helper functions in HashGraph

We partition the connected nodes into two sets: outgoing and incoming, by using the out_nodes and in_nodes instance variables in the LocalInfo class. Likewise, we partition the edges into two sets: outgoing and incoming, by using the out_edges and in_edges instance variables in the LocalInfo class.

All these sets could be computed when necessary from the main maps, but it is more efficient to cache their values and then be able to use these values directly, without (re)computing them. The downside of caching is that it takes more space and whenever we udpate the main maps, we must also update the related LocalInfo sets too, which is where some of the complexity of this programming assignment comes from.

A variety of methods return const references to the two main maps and the sets in LocalInfo. We can iterate over this information, but we should never directly change these maps and sets (even through an iterator) returned by these methods. We should mutate the Graph only through the methods it supplies, which ensure that the necessary redundant information is kept correctly consistent. Of course, we can use the iterator constructors to build other data types with this information.

Here is a brief list of all the methods in HashGraph . All of these methods are documented more fully in the template class in the download.

    Commands:
    1. There are two constructors: the default constructor constructs an empty graph; the copy constructor constructs a graph that is a copy of its parameter.

    2. add_node adds a node to the graph, with no outgoing or incoming nodes/edges: i.e. empty LocalInfo sets, updating node_values.

    3. add_edge adds an edge to the graph, updating the edge_values and LocalInfo for its origin and destination nodes: it adds these two nodes automatically, if they do not already exist in the graph.

    4. remove_node removes a node from the graph, as well as all the edges that refer to that node, updating the edge_values and LocalInfo for the affected nodes (affected nodes are either origin or destination nodes for the removed edges).

    5. remove_edge removes an edge from the graph, updating the LocalInfor for its origin and destination nodes and updaint edge_values.

    6. clear removes all nodes and edges from the graph.

    7. load reads a graph from a text file (see the comment for details on the form of the text file: also see the file standard.txt for an example).

    8. store stores a graph into a text file (see the comment for details on the form of the text file) The details ensure that graphs that are stored into a text file can then be loaded back into a graph.

    Queries

    1. empty returns whether or not the graph is empty: no nodes/edges.

    2. node_count returns the number of nodes in the graph.

    3. edge_count returns the number of edges in the graph.

    4. has_node returns whether or not a node is in the graph.

    5. has_edge returns whether or not an edge is in the graph.

    6. edge_value returns the value (of class T) associated with the edge that is its parameter (or null if that edge is not in the graph). Throw the GraphError exception if the edge is not in the graph.

    7. in_degree returns the number of incoming edges to a node. Throw the GraphError exception if the node not in the graph.

    8. out_degree returns the number of outgoing edges from a node. Throw the GraphError exception if the node not in the graph.

    9. degree returns the number of incoming+outgoing edges of a node. Throw the GraphError exception if the node is not in the graph.

    10. all_nodes returns the node_values map.

    11. all_edges returns the edge_values map.

    12. out_nodes returns the set of nodes in the graph that are destinations of the parameter node via some edge in the graph. Throw the GraphError exception if the node is not in the graph.

    13. in_nodes returns the set of nodes in the graph that are originss of the parameter node via some edge in the graph. Throw the GraphError exception if the node is not in the graph.

    14. out_edges returns the set of edges in the graph that have the parameter node as their origin. Throw the GraphError exception if the node is not in the graph.

    15. in_edges returns the set of edges in the graph that have the parameter node as their destination. Throw the GraphError exception if the node is not in the graph.

    Operators

    1. << insert on stream operator

    2. = the assignment operator This is a bit tricky because the from_graph pointer in all the LocalInfo objects in the copied map must be updated (see the connect function) to the new HashGraph.

    3. == equality operator (same nodes and edges). It is inefficient to test equality for the LocalInfo for each node; instead, what should be checked is that both graphs have the same set of node names, and that both edge maps are equal (same edges with same values).

    4. != inequality operator (different nodes or edges); see above.
Although this class contains many methods, many are similar in functionality (and code). Most are implemented easily via the powerful data types (maps and sets) used to represent the graph. As with many programming problems, understanding the concepts involved will take lots of time; writing code must come afterward, and it should be more straightforward.

An Adjustable PriorityQueue Class (modifying the HeapPriorityQueue class) The standard efficient implementation of a priority queue (pq) data type uses a heap data structure. It stores each pq value in an array (efficiently storing the heap binary tree) that conforms to the the max-heap ordering property. We implemented a priority queue in this way in Programming Assignment #3. Once a value is enqueued into the pq, we should not mutate it, because doing so would likely invalidate the heap ordering property: the mutated value will not likely be stored at the correct index to keep the heap ordering property satisified.

In an adjustable priority queue data type, such updating is allowed via the update method, which specifies as arguments an old pq value (one already in the pq) and what new pq value to replace it by (so there still is no mutation, just replacement of one value by another in the array). The update operation must

  1. find the index of the array storing the old pq value
  2. store the new pq value in the array at that index, replacing the old pq value
  3. percolate the new pq value upward or downward (as appropriate) to restore the heap ordering property in the array

In the standard heap implementation, there is no efficient way to do step 1: finding the index of the array storing the old pq value. The heap ordering property does little good when searching for a value: if the value isn't at the root, we must still search both subtrees (although, if we are searching a max-heap, we could avoid searching below any internal node that compares "less than" the value we are searching for, because all values below it would be even smaller). In fact, in the standad heap priority queue implementation, when we had to find the index of a pq value (for erasing it, using an iterator), we did so by just linearly scanning the array for that value. Thus, in this implementation, even step 1 above would be of complexity O(N).

An improved implementation of an adjustable priority queue, would still store a heap in an array (for quickly removing the maximum priority value, with complexity O(Log N)). In addition, it would store an auxillary map whose keys are pq values, and associated with each key/pq value would be the index in the heap array at which the value is stored. (so, we now have a requirements that the pq values must be unique; using this map we can efficiently, with complexity O(1), check whether or not an enqueued value is unique.) Thus, we would do step 1 by using the map to locate the index of the old value in the array storing the heap. Using a map implemented by a hash table, this operation is of complexity O(1).

We can then efficiently perform steps 2 and 3: storing the new pq value in the array at the same index, with complexity O(1); percolating the new value upward or downward to restore the heap ordering property in the array, with complexity O(Log N).

But, as we percolate values in the array, we must also remember to update the map that we use to remember the index of every value stored in the array, because the indexes of the values are changing. Each swap (there are at most O(Log N) swaps), must update the map using an operation with complexity O(1). So the resulting complexity for the entire update operation is O(1)+O(1)+O(1)*O(Log N) = O(Log N).

The download includes the file heap_adjustable_hash_priority_queue.hpp which updates my original solution, heap_priority_queue.hpp. It already includes...

  1. An updated template that includes both the tgt function that can be used by the priority queue and the thash function that can be used by the hash map keeping track of indexes.

  2. The declarations for the updated constructors, including both the cgt function that can be used by the priority queue and the chash function (and the_load_threshold) that can be used by the hash map keeping track of indexes.

  3. The declaration of the public update method: a method you will have to write (see below).

  4. The declaration of the public method named kludge_test_all_index_equivalences, which checks whether the lookup map contains the correct index for each value in the priority queue (pq) array. This method is used in the googletest to verify all indexes are correct. You can also call this method within your code while debugging it.

  5. The declaration of the private instance variable named lookup that stores the map (which is implemented using a hash table).

  6. The declaration of the private method named swap that you will write and call (in percolate_up and percolate_down) to both swap the values at two indexs in the pq array and update the lookup map to reflect the swap.

  7. The definitions of the updated constructors. These are are tedious to write Examine how the constructors that store values into the priority queue (a) check for duplicate values (not allowed in this data type) and (b) ensure that the lookup map's indexes match the indexes in the pq array.

  8. An updated str() method that shows the indexes in the pq array for each value and the indexes associated with each value in the lookup map. This is useful for visually checking the invariant explained below.
Note that the invariant between the pq array and lookup map is that for every i (an index used in pq): i == lookup[pq[i]]. You can check this by looking at the information shown in the result returned by str(), which includes the indexes in the pq array for each value and the indexes associated with each value in lookup map. This is also what the kludge_test_all_index_equivalences method checks.

Update the methods in the heap_adjustable_hash_priority_queue.hpp file as follows (all updates must preserve the invariant). Note that each method to update includes // update: as a comment.

  1. When putting a value into the adjustable priority queue (via enqueue or update), we must ensure that it is not already stored there (and raise an exception if it is); we must update the lookup map to record the location of this value initially, and as it percolates.

  2. In dequeue, we should erase that dequeud value from the lookup map (the value will no longer in the pq array). The lookup map must also be updated when moving the last value (if there is one) into array index 0, before percolating it downward.

  3. In clear, we must also clear the lookup map.

  4. In the = operator, when putting values in the left-hand pq array, also put them in the lookup map using the appropriate index.

  5. In the new swap private/helper method, swap the two values in the pq array and swap the indexes associated with these value in the lookup map.

  6. In percolate_up and percolate_down change the std::swap to a call of the private method swap in which lookup is updated to reflect each index change for the two pq values swapped.

  7. When erasing values using the iterator: lookup is used to find the the index of the value to be erased (now in O(1) time); that value should be removed from the lookup map, and finally lookup must also be updated when moving the last value into that array index, before percolating it (similar to what is required in dequeue).

Extended Dijkstra's Algorithm
Implementing a Graph Algorithm with the HashGraph class
In this part of the programming assignment, you will implement an extended version of Dijkstra's Algorithm, using the HashGraph class -as well as various other templated classes. This algorithm reads a graph whose edges store integers (representing non-negative costs), and then prompts the user for a start node. It then computes a map that contains the minimum cost (shortest) paths to all reachable nodes from that start node, and prints this map. Finally, the user is prompted for any number of stop nodes, and prints the minimum cost to reach that node and the list the nodes on this path.

The project folder contains the files dijkstra.hpp and dijkstra.cpp. The dijkstra.hpp provides an Info class and requires you to write two methods: one to compute shortest paths; one to reconstruct the nodes on a shortest path. The dijkstra.cpp contains the main function (and any other helper functions that you want to write) which performs this task, calling the methods in dijkstra.hpp as needed.

See the lecture notes, which summarize the extended Dijkstra algorithm: translate it into C++.

Copy the form of the following interaction (the output of the graph is elided in a few places for formatting purposes; the 3 lines of the map shown after the start node is actually one long line).

  Enter graph file name[flightcost.txt]: flightdist.txt
  graph[
    BOS -> LocalInfo[
           out_nodes = set[MIA,JFK,SFO,ORD]
           out_edges = set[->MIA(1258),->JFK(187),->ORD(867),->SFO(2704)]
           in_nodes  = set[MIA,JFK,ORD,SFO]
           in_edges  = set[MIA(1258)->,ORD(867)->,JFK(187)->,SFO(2704)->]]
    BWI -> LocalInfo[
           out_nodes = set[MIA,JFK,ORD]
           out_edges = set[->MIA(946),->ORD(621),->JFK(184)]
           in_nodes  = set[MIA,JFK,ORD]
           in_edges  = set[MIA(946)->,JFK(184)->,ORD(621)->]]
    DFW -> LocalInfo[
           out_nodes = set[JFK,ORD,SFO,MIA,LAX]
           out_edges = set[->ORD(802),->SFO(1464),->MIA(1121),->LAX(1235),->JFK(1391)]
           in_nodes  = set[JFK,SFO,ORD,MIA,LAX]
           in_edges  = set[SFO(1464)->,ORD(802)->,MIA(1121)->,LAX(1235)->,JFK(1391)->]]
    JFK -> LocalInfo[
           out_nodes = set[BWI,ORD,MIA,PVD,BOS,DFW]
           out_edges = set[->BWI(184),->ORD(740),->MIA(1090),->PVD(144),->BOS(187),->DFW(1391)]
           in_nodes  = set[BWI,ORD,PVD,MIA,BOS,DFW]
           in_edges  = set[BWI(184)->,ORD(740)->,PVD(144)->,MIA(1090)->,BOS(187)->,DFW(1391)->]]
    LAX -> LocalInfo[
           out_nodes = set[MIA,DFW,SFO]
           out_edges = set[->MIA(2342),->SFO(337),->DFW(1235)]
           in_nodes  = set[MIA,DFW,SFO]
           in_edges  = set[MIA(2342)->,SFO(337)->,DFW(1235)->]]
    MIA -> LocalInfo[
           out_nodes = set[BWI,JFK,BOS,DFW,LAX]
           out_edges = set[->BWI(946),->DFW(1121),->LAX(2342),->BOS(1258),->JFK(1090)]
           in_nodes  = set[BWI,JFK,BOS,DFW,LAX]
           in_edges  = set[BWI(946)->,DFW(1121)->,LAX(2342)->,BOS(1258)->,JFK(1090)->]]
    ORD -> LocalInfo[
           out_nodes = set[JFK,BWI,SFO,PVD,BOS,DFW]
           out_edges = set[->BOS(867),->DFW(802),->JFK(740),->PVD(849),->BWI(621),->SFO(1846)]
           in_nodes  = set[BWI,JFK,SFO,PVD,BOS,DFW]
           in_edges  = set[BOS(867)->,JFK(740)->,DFW(802)->,PVD(849)->,BWI(621)->,SFO(1846)->]]
    PVD -> LocalInfo[
           out_nodes = set[JFK,ORD]
           out_edges = set[->JFK(144),->ORD(849)]
           in_nodes  = set[ORD,JFK]
           in_edges  = set[ORD(849)->,JFK(144)->]]
    SFO -> LocalInfo[
           out_nodes = set[BOS,ORD,LAX,DFW]
           out_edges = set[->LAX(337),->BOS(2704),->ORD(1846),->DFW(1464)]
           in_nodes  = set[BOS,ORD,LAX,DFW]
           in_edges  = set[LAX(337)->,BOS(2704)->,ORD(1846)->,DFW(1464)->]]
  ]
  Enter start node (must be in graph): BWI
  map[JFK->Info[JFK,184,BWI],BOS->Info[BOS,371,JFK],BWI->Info[BWI,0,?],
  ORD->Info[ORD,621,BWI],LAX->Info[LAX,2658,DFW],MIA->Info[MIA,946,BWI],
  PVD->Info[PVD,328,JFK],SFO->Info[SFO,2467,ORD],DFW->Info[DFW,1423,ORD]]

  Enter stop node (must be in graph or QUIT): LAX
  Cost is 2658; path is queue[BWI,ORD,DFW,LAX]:rear

  Enter stop node (must be in graph or QUIT): SFO
  Cost is 2467; path is queue[BWI,ORD,SFO]:rear

  Enter stop node (must be in graph or QUIT): QUIT
You can test your code on the flightdist.txt and flightcost.txt text files: each represents a graph with int edge values.

Testing

The easiest way to start testing/debugging the HashGraph class is by using its driver program. It allows you to perform any method call supported by the templated classes, and see the state of the class (or view the debugger). Here is reasonable order to write and test the code: unlike implementing data types with data structures, these method can be written and tested more independently.

Write the default constructor and the add_node and add_edge methods first. Ensure that when adding an edge, you update LocalInfo for the origin and destination nodes. When these work successfully, write the has_node, has_edge, and edge_value methods; also in_degree, out_degree, and degree. Now return to the commands that remove information from the graph: remove_edge is much simpler than remove_node, because removing a node requires removing any edge using that node, which means updating the LocalInfo for all nodes that are connected by an edge to the removed node. Then finish writing the commands: the clear, load and store methods. Next, write the copy constructor and all the remaining queries: most are just a single line of code, after the checking/throwing of an exception. Finally, write the =, ==, and != operators.

After you debug your code with the driver, try running the appropriate GoogleTest code. Again, this form of testing is useful only as you approach a finished solution. We will use the GoogleTest, and visual inspection, to grade this assignment. Important Note: You can put std::cout statements in the GoogleTest code (but don't accidentally remove any of the assertions, otherwise you won't be fully checking your code the way we will) while you are debugging your classes. All debugging std::cout should end in std::endl to flush the output stream: ensure the output it displayed before executing the next statement

In this GoogleTest I have commented-out a try-catch block surrounding the test; if you are having problems with code inside a test, uncomment this code and you might get some useful information.

Likewise, the easiest way to start testing/debugging the HeapAdjustableHashPriorityQueue classes is by using its driver program and checking that the invariant is correct; that is easy to do in the driver because it shows both the heap array and the map showing at what index each value is stored. After you debug your code with the driver, try running the appropriate GoogleTest code (see comments above).

The easiest way to start testing/debugging the Dijkstra algorithm is by embedding print statements (possibly controlled by a debugging boolean) that trace the progress of the algorithm, including printing each important data structure as the algorithm executes.