Strongly connected components

Strong connectivity and equivalence relations

In undirected graphs, two vertices are connected if they have a path connecting them. How should we define connected in a directed graph?
We say that a vertex a is strongly connected to b if there exist two paths, one from a to b and another from b to a.
Note that we allow the two paths to share vertices or even to share edges. We will use a ~ b as shorthand for "a is strongly connected to b". We will allow very short paths, with one vertex and no edges, so that any vertex is strongly connected to itself.

Recall that a relation is another word for a collection of pairs of objects (if you like, you can think of a relation as being a directed graph, but not the same one we're using to define connectivity). An equivalence relation a # b is a relation that satisfies three simple properties:

• Reflexive property: For all a, a # a. Any vertex is strongly connected to itself, by definition.
• Symmetric property: If a # b, then b # a. For strong connectivity, this follows from the symmetry of the definition. The same two paths (one from a to b and another from b to a) that show that a ~ b, looked at in the other order (one from b to a and another from a to b) show that b ~ a.
• Transitive property: If a # b and b # c, then a # c. Let's expand this out for strong connectivity: if a ~ b and b ~ c, we have four paths: a-b, b-a, b-c, and c-b. Concatenating them in pairs a-b-c and c-b-a produces two paths connecting a-c and c-a, so a ~ c, showing that the transitive property holds for strong connectivity.
Since all three properties are true of strong connectivity, strong connectivity is an equivalence relation.

Note that it was critical for our definition that we allowed the paths a-b and b-a to overlap. If we made a small change such as defining two vertices to be connected if they are part of a directed cycle, we wouldn't be able to concatenate the paths and show that the transitive property holds.

Equivalence classes and strongly connected components

For any equivalence relation a # b, we can define equivalence classes by the formula
```    [a] = { b | a # b }
```
(in English, the equivalence class of a, which we call "[a]", is defined to be simply the set of things related to a). The equivalence classes for strong connectivity are called strongly connected components.

These sets have the property that they partition the space of all vertices into disjoint subsets.

(This is not hard to prove. First, any vertex a is a member of [a] by reflexivity, so the equivalence classes cover all of the input. And second, if b is in [a] then [a]=[b] (by symmetry and transitivity, any element of one is an element of the other) so any two different equivalence classes must be disjoint.)

If we can find all the strongly connected components of a graph, it would be easy to test whether any two vertices are strongly connected: just see if they're in the same component.

Component graph and weak connectivity

Strongly connected components also have a use in other graph algorithms: if you replace every strongly connected component by a single vertex, you get a smaller directed acyclic graph, known as the component graph or condensation (Baase ex. 4.42 asks you to prove this fact.)

For some graph problems, you can use this idea to get an algorithm that reduces the problem to subproblems on each component, plus one more subproblem on the component graph. Here's an example (this problem isn't in Baase, and I didn't get to this in my lecture, so I won't test you on it):

Suppose we define two vertices a and b to be weakly connected (also known as semiconnected) if there's either a path from a to b or one from b to a (but not necessarily both). We say the graph is weakly connected if this is true for every pair of vertices. Then it's not hard to show that a graph is weakly connected if and only if its component graph is a path. So by computing the strongly connected components, we can also test weak connectivity.

Computing a single component

From the definition above, it is easy to find a single strongly connected component [x]. Simply use BFS, DFS, or any other similar algorithm to find a set S of all vertices reachable from x by a path. Do the same thing in the graph formed by reversing all the edges of our original graph, to find a set T of all vertices that can reach x by a path. According to the definition above, [x] is just the intersection of S and T.

So in O(m) time we can find a single component. Since there are O(n) components, we can find them all in time O(mn). But this slower than necessary. The point of today's lecture is to show how to solve the problem in linear time. (The solution we describe, based on depth first search, was invented by Bob Tarjan in 1972. Baase ex. 4.50 outlines an alternative linear time algorithm.)

Depth first search again

A tangent on pseudo-code: I haven't been writing the same pseudocode as in the book for the same reason I haven't been speaking the same sentences in the book. The ideas matter, the exact pseudocode doesn't. So if you're asking which should I memorize, the book or the lecture the answer is neither, you should get to understand them to the point where they seem like the same idea and remember that idea. With that in mind, here's pseudocode for DFS (directed graph version) that looks a little different from what we did last time.

One complication (that I forgot to mention in lecture) is that we want to build a DFS tree that involves all the vertices of the graph. If we just start somewhere in the graph, not all vertices might be reachable, and the DFS will not get to them. One solution would be to restart the DFS every time this happens, but to make things a little simpler, I'm going to modify the graph by adding a new vertex connected by outward-going edges to everything else. This doesn't change the strongly connected components (except to add one new component for the one new vertex) but keeps the rest of the algorithm simpler.

```    DFS(G)
{
make a new vertex x with edges x->v for all v
build directed tree T, initially a single vertex {x}
visit(x)
}

visit(p)
{
for each edge p->q
if q is not already in T
{
add p->q to T
visit(q)
}
}
```
This version of the pseudo-code makes it obvious that only certain edges can occur: if q is not already in T, p->q gets added, so if p->q does not end up in tree, q must be already in tree. There are three possible places q could be: an ancestor of p (in which case we call p->q a back edge), a descendant of p (in which case we call p->q a forward edge), or in a previous branch of the tree (in which case we call p->q a cross edge). The one case that's ruled out is that q can not be in a later branch of the tree.

DFS trees and strongly connected components

The key property, that relates DFS to strong connectivity, is that strongly connected components form subtrees of the DFS tree. (In other words, a component can not be in two separate parts of the tree.)

Why?

Note that if we have paths a-b and b-a, any two intermediate vertices of those paths would have to be also in the same component (since e.g. if we have a-c-b then we already have a path a-c and by concatenating c-b-a we also get a path c-a).

So suppose one component ended up in two parts of the tree. Then it would have to have edges from one part to the other (the definition of strong connectivity tells us there must be paths, but the observation above about intermediate vertices being part of the same component tells us they would actually just be edges).

The two parts couldn't be in side by side branches of the tree, because there would be no edges in one of the two directions. But on the other hand, if one part contains an ancestor x of a vertex y in the other part, we can use the argument above about intermediate vertices to show that the path in the tree from x to y is also in the same component, contradicting the assumption that x and y are in different parts of the tree. So it is not possible to have a component in two separate parts of the DFS tree, which is what we wanted to prove.

Since the components of the graph are just subtrees of the DFS tree, to find components, we just have to break tree at certain edges, and the components will be formed by what's left of the tree. We'll say a vertex is a "head" of a component if it's the topmost (i.e. if we should break the edge coming into it). By the observations above, the problem has turned into one of determining whether a given vertex v is a head.

To test this, look at the subtree of the DFS tree, rooted at v. Suppose this subtree does not have any back or cross edges going out of it. Then clearly, v must be the head of [v], since there are no paths from v to any vertex higher in the tree.

Just as clearly, if there is a back edge u-w from this subtree to an ancestor of v, v is not a head. In this case, the edge u-w together with the paths in the DFS tree from w to v and from v to u form a cycle, which must all be part of the same component [v]. But w is higher in the tree than v, so v can not be the head of this component.

The complicated case happens when the only edges going out of the subtree rooted at v are cross edges to other branches of the DFS tree. To make this complicated case a little easier, we'll set up our algorithm so that as soon as the DFS finishes visiting a vertex, if it is a head, we delete it and its component from the graph. We can show that if our algorithm does this, then whenever we see a cross edge out of the subtree from v, v is not a head.

Proof: This is where we use the fact that DFS trees have cross edges only to previously visited branches of the tree, not to later branches. Suppose we see a cross edge u-w. Let z be the head of [w], then z is visited no later than w. If z were in a separate branch of the tree, we'd have finished visiting it and deleted both it and w, contradicting the assumption that we're seeing edge u-w. So z is an ancestor of v, and putting edge u-w together with the paths w-z (by assumption that z is the head of [w]), z-v (since z is an ancestor of v) and v-u (since v is an ancestor of u) gives us a cycle, showing that v is in [z] and therefore v is not a head.
Summarizing, we see that we can test whether a vertex is a head by looking for the existence of back or cross edges out of its subtree.

Strong connectivity algorithm

Define the DFS numbering dfsnum(v) to be the number of vertices visited before v in the DFS. Then if there is a back or cross edge out of the subtree of v, it's to something visited before v and therefore with a smaller dfsnum. We use this by defining the low value low(v) to be the smallest dfsnum of a vertex reachable by a back or cross edge from the subtree of v. If there is no such edge, low(v)=dfsnum(v). Then rephrasing what we've seen so far, v is a head of a component exactly when low(v)=dfsnum(v). The advantage of using these definitions is that dfsnum(v) is trivial to calculate as we perform the DFS, and low(v) is easily computed by combining the low values from the children of v with the values coming from back or cross edges out of v itself.

We use one more simple data structure, a stack L (represented as a list) which we use to identify the subtree rooted at a vertex. We simply push each new vertex onto L as we visit it; then when we have finished visiting a vertex, its subtree will be everything pushed after it onto L. If v is a head, and we've already deleted the other heads in that subtree, the remaining vertices left on L will be exactly the component [v].

We are now ready to describe the actual algorithm. It simply performs a DFS, keeping track of the low and dfsnum values defined above, using them to identify heads of components, and when finding a head deleting the whole component from the graph, using L to find the vertices of the component.

```    DFS(G)
{
make a new vertex x with edges x->v for all v
initialize a counter N to zero
initialize list L to empty
build directed tree T, initially a single vertex {x}
visit(x)
}

visit(p)
{
add p to L
dfsnum(p) = N
increment N
low(p) = dfsnum(p)
for each edge p->q
if q is not already in T
{
add p->q to T
visit(q)
low(p) = min(low(p), low(q))
} else low(p) = min(low(p), dfsnum(q))

if low(p)=dfsnum(p)
{
output "component:"
repeat
remove last element v from L
output v
remove v from G
until v=p
}
}
```
We have already seen an explanation for why this algorithm works. It only remains to point out that it takes linear time -- the basic framework is just DFS, and the added manipulations of low, dfsnum, and L do not slow this down at all. So we can find strongly connected components in linear time.

ICS 161 -- Dept. Information & Computer Science -- UC Irvine
Last update: