Lecture notes for February 6, 1996

o---o |\ /| | X | |/ \| o---ohas sixteen spanning trees:

o---o o---o o o o---o | | | | | | | | | | | | | | | | | | o o o---o o---o o---o o---o o o o o o o \ / |\ / \ / \ /| X | X X X | / \ |/ \ / \ / \| o o o o o---o o o o o o---o o o o---o |\ | / | /| \ | \ | / | / | \ | \| / |/ | \ o o o---o o o o---o o---o o o o o o---o |\ | / \ | /| | \ | / \ | / | | \ |/ \| / | o o o---o o---o o o

This problem can be solved by many different algorithms. It is the topic of some very recent research. There are several "best" algorithms, depending on the assumptions you make:

- A randomized algorithm can solve it in linear expected time. [Karger, Klein, and Tarjan, "A randomized linear-time algorithm to find minimum spanning trees", J. ACM, vol. 42, 1995, pp. 321-328.]
- It can be solved in linear worst case time if the weights are small integers. [Fredman and Willard, "Trans-dichotomous algorithms for minimum spanning trees and shortest paths", 31st IEEE Symp. Foundations of Comp. Sci., 1990, pp. 719--725.]
- Otherwise, the best solution is very close to linear but not exactly linear. The exact bound is O(m log beta(m,n)) where the beta function has a complicated definition: the smallest i such that log(log(log(...log(n)...))) is less than m/n, where the logs are nested i times. [Gabow, Galil, Spencer, and Tarjan, Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, vol. 6, 1986, pp. 109--122.]

Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set.

On the other hand, if you draw a path tracing
around the minimum spanning tree, you trace each edge twice and
visit all points, so the TSP weight is less than twice the MST
weight. Therefore this tour is within a factor of two of optimal.
There is a more complicated way (*Christofides' heuristic*) of
using minimum spanning trees to find a tour within a factor of 1.5
of optimal; I won't describe this here but it might be covered in
ICS 163 (graph algorithms) next year.

A better idea is to find some key property of the MST that lets us be sure that some edge is part of it, and use this property to build up the MST one edge at a time.

For simplicity, we assume that there is a unique minimum spanning tree. (Problem 4.3 of Baase is related to this assumption). You can get ideas like this to work without this assumption but it becomes harder to state your theorems or write your algorithms precisely.

Lemma: Let X be any subset of the vertices of G, and let edge e be the smallest edge connecting X to G-X. Then e is part of the minimum spanning tree.Proof: Suppose you have a tree T not containing e; then I want to show that T is not the MST. Let e=(u,v), with u in X and v not in X. Then because T is a spanning tree it contains a unique path from u to v, which together with e forms a cycle in G. This path has to include another edge f connecting X to G-X. T+e-f is another spanning tree (it has the same number of edges, and remains connected since you can replace any path containing f by one going the other way around the cycle). It has smaller weight than t since e has smaller weight than f. So T was not minimum, which is what we wanted to prove.

Kruskal's algorithm: sort the edges of G in increasing order by length keep a subgraph S of G, initially empty for each edge e in sorted order if the endpoints of e are disconnected in S add e to S return SNote that, whenever you add an edge (u,v), it's always the smallest connecting the part of S reachable from u with the rest of G, so by the lemma it must be part of the MST.

This algorithm is known as a *greedy algorithm*, because it
chooses at each step the cheapest edge to add to S. You should be
very careful when trying to use greedy algorithms to solve other
problems, since it usually doesn't work. E.g. if you want to find a
shortest path from a to b, it might be a bad idea to keep taking
the shortest edges. The greedy idea only works in Kruskal's
algorithm because of the key property we proved.

Prim's algorithm: let T be a single vertex x while (T has fewer than n vertices) { find the smallest edge connecting T to G-T add it to T }Since each edge added is the smallest connecting T to G-T, the lemma we proved shows that we only add edges that should be part of the MST.

Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex.

Prim with heaps: make a heap of values (vertex,edge,weight(edge)) initially (v,-,infinity) for each vertex let tree T be empty while (T has fewer than n vertices) { let (v,e,weight(e)) have the smallest weight in the heap remove (v,e,weight(e)) from the heap add v and e to T for each edge f=(u,v) if u is not already in T find value (u,g,weight(g)) in heap if weight(f) < weight(g) replace (u,g,weight(g)) with (u,f,weight(f)) }Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n).

Boruvka's algorithm: make a list L of n trees, each a single vertex while (L has more than one tree) for each T in L, find the smallest edge connecting T to G-T add all those edges to the MST (causing pairs of trees in L to merge)As we saw in Prim's algorithm, each edge you add must be part of the MST, so it must be ok to add them all at once.

Analysis: This is similar to merge sort. Each pass reduces the number of trees by a factor of two, so there are O(log n) passes. Each pass takes time O(m) (first figure out which tree each vertex is in, then for each edge test whether it connects two trees and is better than the ones seen before for the trees on either endpoint) so the total is O(m log n).

Analysis: O(m log log n) for the first part, O(m + (n/log n) log n) = O(m + n) for the second, so O(m log log n) total.

ICS 161 -- Dept.
Information & Computer Science -- UC Irvine

Last update: