Dijkstra's All Sortest Path Algorithm The most complicated graph algorithm that we will examine in detail in ICS-46 is Dijkstra's All Shortest (least costly) Paths Algorithm. Given a starting node and a graph, it computes the shortest paths (in a directed graph with weighted edges; the edge weights must all be non-negative) to all the nodes that can be reached in the graph from the starting node. The length of a path (and the measure of "shortness") is based on the sum-of-the-weights of all the edges on the path. Actually, I will call it an Extended version of Dijkstra's algorithm, because we will store enough information not only to compute the cost of the shortest paths, but to reconstruct these paths as well: what nodes/edges lie between the starting and ending nodes, and what order they must be followed. The standard Dijkstra algorithm computes just the cost of all the shortest paths, but omits information for reconstructing the paths. This algorithm generalizes the reachability algorithm that we wrote in Programming Assignment #1, by finding not just all nodes reachable from a start node, but the minimum cost path to reach each node; in Programming Assignment #1, our map stored no edge weights to compute such path costs. GPS systems use a variant of this algorithm to search from where you are to where you want to go. But it is more specialized/efficient, because it knows where you want to go at the time the algorithm starts: with Dijkstra's algorithm, the destination(s) are supplied at the end, after the algorithm does all its work computing shortest paths. This algorithm uses many data types to get its job done: a Graph, two Maps, a Priority Queue, a Queue, and Stack. I will describe the algorithm in detail below, and then we will hand simulate it on a moderate-sized graph in class. You will implement this algorithm as part of Programming Assignment #5. Most data in the algorithm is stored as objects from a class named Info, which contains (1) an approximation to the cost of the minimum path to reach that node (initialzed to +infinity and updated in the algorithm until the actual minimum cost is found), and (2) the name of the node BEFORE it on the shortest path to it (initialized to "?" and updated in the algorithm). Objects in the Info class are ultimately stored in three collections: as the values in two Maps (the key is the node name), and in an Adjustable Priority Queue (see the writeup of Programming Assignment #5 which discusses the Adjustable variant of a PriorityQueue in detail: basically, we can use the update method to replace one priority queue value with another; when update is executed, the priority queue adjusts itself according to the new value's priority: in a heap implementation, it percolates the value up or down to the correct position). The basic algorithm is Get the inputs: 1) Input the Graph. In your code, prompt the user for a file name and load and print its graph, whose edges are will be labeled by non-negative integer values. 2) Input the start node. In your code, prompt the user to enter a start node. It will compute all shortest paths FROM that particular START node. Compute the shortest paths: 3) Call the extended_dijkstra function with this information, returning a map of the minimum costs to reach each node and the information needed to construct the minimum cost path to any node. In the extended_dijkstra function: 3a) Declare the answer_map to be empty and the info_map to contain each node in the graph as a key in the map, with its associated value a newly constructed/initialized object of Info for that node. Update the start node in the info_map by setting its total_cost to 0 (since we start at that node, the cost to reach it is 0). Note the info_map contains nodes whose minimum distance from the start node ARE NOT YET KNOWN; the answer_map contains nodes whose minimum distance ARE KNOWN. 3b) Declare the info_pq, and load it with the current contents of the info_map: here the smallest cost has the highest priority. 3c) Loop so long as the info_map is not empty... 3c1) Remove the Info from info_pq with the smallest associated cost. Initally it will remove the start node (cost 0; others costs are all infinity). If its cost is infinity, then no more nodes in info_map are reachable, so terminate. 3c2) Call "min_node" the node from this Info and "min_cost" its cost. We are now guaranteed to know the least costly path from the start node to min_node. 3c3) Remove this key->value from the info_map and put it into the answer_map. 3c4) For every node d that is a destination from the min_node and not already in the answer_map, get d's Info using info_map and see if the cost is infinite or greater than the cost of the path from the start node to min_node, plus the cost of the edge from min_node to d. If it is infinte, or the computed sum is smaller than its stored value, (1) In info_map, update the cost in Info to this smaller number, and update the predecessor of d to be min_node, (2) update Info to the adjustable info_pq to this new information 3c5) Continue around the loop 3d) When the loop finishes, the Info values in answer_map are filled with the mminimum cost to reach each node and the node preceding it on the minimum path. Return this information. Find the minimum cost paths to any nodes in the graph: 4) Repeatedly prompt the user for a stop node and show the minimum cost and minimum cost path to reach that node, by calling the recover_path, which returns a Queue of the nodes on the shortest path between the start node and stop node. By repeatedly following the predecessors in the map (a Stack is useful here), we can reconstruct a queue containing the entire minimimum cost path, from the start node to any reachable node. Although I won't show the hand simulation here, I will do one in class and you should be able to hand simulate this algorithm, from memory. The steps listed above are very detailed when applied to actual C++ data types, but when I peforming the algorithm on the whiteboard it will be more intuitive and reasonably fast. Again, I called this algorithm an Extended version of Dijkstra's algorithm, because we will store enough information not only to compute the cost of the shortest paths, but to reconstruct them as well: what nodes/edges lie between the starting and ending nodes. The standard Dijstra algorithm computes just the cost of all the shortest paths, without storing information to reconstruct the paths. ------------------------------------------------------------------------------ Simplistic Analysis: Dense and Sparse Graphs Assume that we have a dense graph: each of its N nodes has O(N) edges. The loop specified in step 3c loops once for each node, so it iterates O(N) times. Each time it loops it executes: part 3c1 takes O(Log N), part 3c3 takes O(1), and part 3c4 takes O(N)xO(Log N) -because in a dense graph each node can have O(N) outgoing edges to process and enqueue; so the entire algorithm is O(N) x ( O(Log N) + O(1) + O(N)xO(Log N) ) or O(N^2 Log N). Assume that we have a sparse graph: each of its N nodes has O(1) edges. The loop specified in step 3c loops once for each node, so it iterates O(N) times. Each time it loops it executes: part 3c1 takes O(Log N), part 3c3 takes O(1), and part 3c4 takes O(1)xO(Log N) -because in a sparse graph each node has some constant number of outgoing edges to process and enqueue; so the entire algorithm is O(N) x ( O(Log N) + O(1) + O(1)xO(Log N) ) or O(N Log N). So we can unify both cases by writing the bound as O( (M+N)Log N ) (recall M is the number of edges in the graph). ------------------------------------------------------------------------------ Proof of Correctness of Dijkstra's Algorithm: By Induction and Contradiction Prove: The final answer_map contains all the correct answers (shortest paths). Proof by induction on the size of the answer_map. 1) size 1: The answer_map contains just the starting node (call it s), with cost 0. That is the correct answer. 2) Assume that the answer_map, when its size is between 1 and k inclusive, has all the correct answers. Prove that the next value put in the answer map, increasing its size by 1, is correct. Let v be the next node added to answer_map. Call the node it came from u (where u is some node in the answer_map, and there is an edge from u->v). We prove that the path from s->u extended by the edge from u->v is the shortest path from s to v. Proof by Contradiction: Assume that there is a shorter path from s->v through a node y that is not yet in the answer_map; let x be the last node on this path that is in the answer_map. Then there is a path from s->x and from x->y and from y->v. We have the following picture (with s, x, and u in the answer_map and y and v not in the answer_map) in answer_map +---------+ | | | x ---+---y | / | | | ... | | | / | | | s | | | \ | | | ... | | | \ | | | u ---+-- v | | +---------+ Now, because v was chosen to be the next node in answer_map, the distance from s->v must be less than the distance from s->y (otherwise y would have been chosen). Because the edges leading from y to v (there must be at least one) must have a non-negative value, it means that any path s->y->v must be as long or longer than the path from s->u->v length(s->u->v) <= length(s->x->y) <= length(s->x->y) + len(y->v) So, there can be no such node y tht leads to a SHORTER path from s to v. So our assumption is wrong, and the shortest path from s->v is correctly added to answer_map.