(Last modified Tue Jun 03 14:17 2008)
Figure 1. Control flow graph for m(x,y)
Figure 2. Control and data flow graph for m(x,y)
There are many graph-based testing criteria: some are listed below. The criteria are easiest to understand in terms of programs and these two graphs:
Uses include appearances in the RHS of an assignment
(including "invisible" ones like x += 1
which is both a def and a use of x
because it means x = x+1),
as actual parameters of calls,
as return values from a block,
and
in a branch or loop condition.
Every control-flow path
x
x
x
(it passes through no intervening defs of x)
is an edge in the data flow graph.
However, they can be applied to any graph describing any system artifact, for example specifications, designs, and requirements (see below).
Here is an example program (that does nothing in particular) whose control flow and data flow graphs are shown in the figures.
1 int m(int x, int y) {
2 while (x > 10) {
3 x -= 10;
4 if (x == 10) {
5 break;
6 }
7 }
8 x = square(x);
9 if (y < 20 && x%2 == 0) {
10 y += 20;
11 }
12 else {
13 y -= 20;
14 }
15 return 2*x + y;
16 }
We list paths through the program using only the bold statement numbers; we don't list 6, 7, 11, 14, or 16 because those lines contain no def or use.
We list the line with the loop condition every time the condition is checked:
break
because x==10:
1,2,3,4,8, ...
break
because x==10:
1,2,3,4,2,3,4,8, ...
Test set for All-statements coverage:
(a) x=20, y=10: 1,2,3,4,5,8,9,10,15.
(b) x=20, y=30: 1,2,3,4,5,8,9,13,15.
Test set for All-edges coverage:
(a) x=20, y=10: 1,2,3,4,5,8,9,10,15.
(b) x=15, y=30: 1,2,3,4,2,8,9,13,15.
(Infinite) test set for All-paths coverage:
| (a) | x=4 | y=10 | 1,2,8,9,10,15 |
| (b) | x=5 | y=10 | 1,2,8,9,13,15 |
| (c) | x=14 | y=10 | 1,2,3,4,2,8,9,10,15 |
| (d) | x=15 | y=10 | 1,2,3,4,2,8,9,13,15 |
| (e) | x=20 | y=10 | 1,2,3,4,5,8,9,10,15 |
| (f) | x=20 | y=20 | 1,2,3,4,5,8,9,13,15 |
| ... | |||
Figure 3. Subsumption among graph coverage criteria
The most commonly encountered criteria are:
All those are control flow criteria. The most commonly encountered data flow criteria are:
Researchers have proved that the various criteria are partially ordered, from the strongest (all-paths) which subsumes all the others, through criteria of intermediate strength (such as all-edges which subsumes all-branches and all-statements), to the weakest (all-statements). For example, any test set that satisfies all-branches also will satisfy all-statements; any test set satisfying all-edges also satisfies all-branches (and thus all-statements).
There are programs for which one or more of these criteria cannot be strictly satisfied. For example, any program with unreachable code can't be covered using any of the criteria above in the strictest sense. Many programs with loops can't be strictly all-paths covered by any finite test set. More subtly, some programs have dependencies between the conditions for two or more if statements or loops, so that although it looks like one could take the program through all paths through the branches, the dependencies make it impossible to execute some paths, some edges in the graph cannot be taken, and even some statements are unreachable. Here is a simple example for which statements 4 and 7 are unreachable:
1 void bad(boolean b) {
2 if (b) {
3 if (!b) {
4 b = !b; // unreachable
5 }
6 while (!b) {
7 b = !b; // also unreachable
8 }
9 }
Consequently, the criteria are usually used to mean only the reachable items (paths, edges, nodes).
Although it is delightful to have these criteria and understand how they are related, they are challenging to work with manually for systems of realistic size, and some kinds automated support is problematic. For example, deriving a set of test cases that achieve all-paths coverage of an arbitrary program is equivalent to solving the halting problem for that program. However, it is much easier to verify coverage than to derive test cases to achieve coverage, and there are software tools to calculate or monitor the coverage of specific test sets.
Figure 4. Call graph
Figure 4 shows a call graph for classes or methods A through F (the use of the graph is the same for either case). For classes, this graph is also the USES graph. We can cover such graphs using control-flow criteria like we did for program control-flow graphs.
The call graph of a procedural program is connected (unless there are functions that are never called). In contrast, the call graph of a single OO class is typically disconnected, since many classes' methods are designed to be used by other classes rather than the class containing them. OO call graph coverage is more useful if all the classes of a program are considered, rather than only a single class.
Figure 5. Covering an OO call graph
Inheritance and polymorphism complicate call and USES graph coverage.
Figure 5 shows a simple call graph
in which a method in class A calls two methods in class C;
in A, objects of type C may be either
objects of class C, of subclass C1, or
of subclass C2.
The full call graph involves calls
to all the m1() and m2() methods:
the call in A to m1() may be to C's,
C1's, or C2's m1(),
and
the call in A to m2() may be to C's or
C2's m2()
(C1 does not redefine m2()).
We can also examine call and USES graphs
combining two or more modules or classes
using data-flow criteria
and last defs and first uses.
Here we examine the code preceding a call
to identify the last defs
of each of the call's actual parameters,
and the code of the method or function
to identify the first uses
of each formal parameter.
A def of x is a last def
before method M
if there is a def-clear(x) path
from the def to the call of M using x.
Similarly,
a use of x is a first use
in (after) method M
if there is a def-clear(x) and use-clear(x) path
from the beginning of M
(from the call of M)
to the use.
The binding of the actual parameter to the formal parameter
is not counted as a first use.
b
1 int m(int x, int y) { // A last def of x before line 8
2 while (x > 10) { // First use of x in m(x,y)
3 x -= 10; // The other last def of x before line 8
4 if (x == 10) {
5 break;
6 }
7 }
8 x = square(x);
9 if (y < 20 && x%2 == 0) { // A first use of x after line 8
10 y += 20;
11 }
12 else {
13 y -= 20;
14 }
15 return 2*x + y; // The other first use of x after line 8
16 }
In the example program m(x.y) above
(presented a second time here),
line 1 of that program
is not a first use of x
because that is the binding of the actual parameter to x.
The first use of x
is on line 2, in the while condition.
m(x,y) has only one first use of x
because all later uses of x
are on paths that include line 2,
and thus are not use-clear(x).
The last defs of x before the call square(x)
in line 8
are lines 1 and 3.
There are two last defs of x before the call to square(x)
because
there is a def-free(x) path from each of them to the call.
The first uses of x after it receives square(x)'s return value
are on lines 9 and 15.
There are two first uses
because && is defined to be minimal
(the second operand is not evaluated if the first operand is false)
in Java, C, and many other languages:
x%2 == 0
in line 9 is only evaluated
if y < 10 is true.
If y < 10 is false
then y < 20 && x%2 == 0
is false regardless of
x%2 == 0,
in all programming languages with which I am familiar.
Figure 6. Subsumption among logic coverage criteria
Here we will look at logic formulae as they can be constructed in a programming language statement. The language is a predicate logic, i.e. first-order logic without the quantifiers ∀ and ∃, in which a formula in the logic is termed a predicate and what would be a predicate in FOL is termed a function. A predicate can be:
!p
p1&&p2
p1||p2
There are coverage criteria for programs based on the predicates in the programs.
The next level is to consider which clauses can control the value of the predicate and uner what circumstances. Let C be a clause in predicate P. If there is a combination of true and false values for all the other clauses in P, such that changing C's truth value results in changing P's truth value, then C is a major clause of P, the other clauses are the corresponding minor clauses, and for those combinations of truth values for the minor clauses, C determines P.
Active clause coverage is closely related to the MC/DC (modified condition/decision coverage) criterion, but is defined without ambiguity. There are three "flavors" of active clause coverage, depending on what is required to be the same for the two cases:
In the example program m(x,y),
there are two predicates
and three clauses
A test set providing predicate coverage (but not clause coverage):
| x | y | x>10 | y<20&&x%2==0 |
|---|---|---|---|
| 8 | 20 | F | F |
| 12 | 19 | T | T |
A test set providing clause coverage (but not combinatorial coverage):
| x | y | x>10 | y<20 | x%2==0 |
|---|---|---|---|---|
| 9 | 20 | F | F | F |
| 12 | 19 | T | T | T |
A test set providing general active clause coverage and correlated active clause coverage (I don't believe this program has a test set providing only general active clause coverage):
| x | y | x>10 | y<20 | x%2==0 | y<20&&x%2==0 | |
|---|---|---|---|---|---|---|
| 9 | 20 | F | F | F | F | x>10 major |
| 12 | 20 | T | F | T | F | y<20 major |
| 11 | 19 | T | T | F | F | x%2==0 major |
| 12 | 19 | T | T | T | T | x>10 major, y<20 major, and x%2==0 major |
A test set providing restricted active clause coverage:
| x | y | x>10 | y<20 | x%2==0 | y<20&&x%2==0 | |
|---|---|---|---|---|---|---|
| 8 | 19 | F | T | T | T | x>10 major |
| 11 | 19 | T | T | F | F | x%2==0 major |
| 12 | 20 | T | F | T | F | y<20 major |
| 12 | 19 | T | T | T | T | x>10 major, y<20 major, and x%2==0 major |
The test set providing combinatorial coverage:
| x | y | x>10 | y<20 | x%2==0 |
|---|---|---|---|---|
| 9 | 20 | F | F | F |
| 8 | 20 | F | F | T |
| 9 | 19 | F | T | F |
| 8 | 19 | F | T | T |
| 11 | 20 | T | F | F |
| 12 | 20 | T | F | T |
| 11 | 19 | T | T | F |
| 12 | 19 | T | T | T |
Test set providing coverage:
| x | y | x>10 | y<20 | x%2==0 | y<20&&x%2==0 |
|---|---|---|---|---|---|
| 9 | 20 | F | F | F | F |
| 8 | 20 | F | F | T | F |
| 9 | 19 | F | T | F | F |
| 8 | 19 | F | T | T | T |
| 11 | 20 | T | F | F | F |
| 12 | 20 | T | F | T | F |
| 11 | 19 | T | T | F | F |
| 12 | 19 | T | T | T | T |
In this approach, we partition the system's input space into blocks, such that all the inputs in each block are "equally useful" for testing. Each block is an equivalence class in the partition. It is necessary that the blocks be disjoint, that is, that every test case falls into a block, and no test case falls into two blocks. Then the test cases cover the input space if there is at least one test case for each block.
Input space partitioning is usually done from the system's specification rather than its code.
Let's consider the substring(int begin, int end) method of
java.lang.String.
| Block | Characterization |
|---|---|
| Returns the empty string | 0≥begin AND
begin+1==end AND
end≤length()
|
| Returns a nonempty string | 0≥begin<end≤length()
|
| Fails (1) | end≤begin
|
| Fails (2) | begin<0 AND begin<end
|
| Fails (3) | length()<end AND 0≤begin<end
|
Then a testing criterion would be to have a test case for each of the five blocks.
It would be easy to create a non-partition of the input space, for example
| Block | Characterization |
|---|---|
| Returns the empty string | 0≥begin AND begin+1==end AND
end≤length()
|
| Returns a nonempty string | 0≥begin<end≤length()
|
| Fails (1) | end≤begin
|
| Fails (2) | begin<0
|
| Fails (3) | length()<end
|
because some test cases fall into more than one block
(for example, the case where
end≤begin<0,
which falls into both Fails(1) and Fails(2)).
In testing software whose behavior is specified by a grammar, there are coverage metrics based on the grammar that can be used. Note that a regular expression corresponds to a grammar, so that these criteria work for regular expressions as well.
There are correspondences between grammars and graphs (FSP is one example): the obvious general one represents strings of symbols (terminal and nonterminal) by states and productions by transitions between states. Therefore these syntax coverage criteria correspond to graph coverage criteria: terminal symbol coverage is analogous to all-nodes, production coverage to all-edges, and derivation coverage to all-paths.