(Last modified Tue Jun 03 15:15 2008)

home teaching course schedule site index

In4matx 115
Software Specification and
Quality Engineering
Spring 2008
Testing

What is testing?

Testing is: 

The goals of testing are, optimistically: 

The results of testing are, one hopes: 

The fundamental problems with testing are ...

Rather than always saying "system or component", from here on I will just say "system" but mean that it may be either an entire system or a component of a system that is being discussed.  Similarly, instead of "context or environment" I will just say "context". 

The basics

Test case
A test case is a context and input data over a period of time that will cause a system or component to produce a result that can be compared against expectations. 
Test set or test suite
A test set or suite is a group of test cases. 
Test script
A sequence of actions that puts a system through a test case.  The actions may be done manually or automatically.  The script may be: 

Load testing is something of a special case; for this, automated support is needed to provide a suitable collection of virtual users and virtual events for the system to deal with. 

Oracle
A person or program that classifies the result of a test case as either acceptable or unacceptable.  Usually we think of an oracle as being a program, but often a person is the oracle for a test suite — people are smarter than programs.  Oracles that are programs are preferable, especially if the tests will need to be run many times.  A human oracle gets bored and careless, and often tries to avoid re-running the tests. 

An automated oracle may compare a result against known good results (this is efficient but inflexible), or may calculate whether a result is acceptable (this is often inefficient but is flexible). 

How do a system and an (automated) oracle to test it differ? 

Fault-error-failure
Fault
The thing that's wrong with the code (or other development artifact) is a fault.  A fault is there whether the program is running or not.  Also called a defect
Error
An error is an undesired program state.  The program state is embodied in the values of variables or attributes, the set of objects that have been constructed, the contents of scratch files or other data repositories, etc. 
Failure
A failure is a behavior or output of a system that is incorrect. 

So in order to observe a failure:

  1. (Reachability) the location(s) containing the fault must be reachable,
  2. (Infection) the program must enter an incorrect state after executing the location, and
  3. (Propagation) the program must propagage the "infected" state to a location that produces an incorrect output as a result. 

Ordinarily, only failures are visible, and testing is geared towards identifying failures.  Once you've found a failure, you then have to work backwards:  usually identifying the error(s) that resulted in the failure, and then the fault(s) that caused the error in the first place. 

It is clear that

To pick an extreme case, faults in unreachable code never cause errors.  There are a wide range of opinions on how frequently this happens; one estimate is that on average it is roughly 10% — only about 10% of faults ever cause errors, and only about 10% of errors ever cause failures.  (Other researchers' estimates are much higher, as much as 90% or more.) 

Black-box testing
Testing against a specification, without knowledge of how the system is implemented.  Also called specification-based testing. 

Black-box testing may allow greater test efficiency (because it can direct testing effort to the requirements stakeholders care most about), and facilites test reuse since black-box tests don't depend on how the system is implemented and can be reused unchanged if the implementation changes. 

White-box or glass-box testing
Testing based on partial or full knowledge of how system is implemented.  A common type is code-based testing, in which test cases are selected to cover the code in various ways. 

White-box or glass-box testing may allow greater test effectiveness (because it can direct testing effort based on the implementation, and after all the implementation is where the faults lie). 

Effectiveness
A test case or suite is effective to the extend that it identifies faults and gives confidence. 
Efficiency
A test case or suite is efficient to the extent that it takes little time, money, or other resources. 
Exhaustive testing
Exhaustive testing is the testing of every possible context and inputs over time that a system can have.  Except for the simplest systems, this is impossible to do because there are an infinite number of possible inputs and contexts and we only have a finite time for testing.  Even for simple systems with finite input domains and contexts (such as Dijkstra's example of multiplying two integers), it is rarely practical to test exhaustively because it takes so long.  Unfortunately, exhaustive testing is the only kind that is guaranteed to show that a system works as it should, and to uncover all the system's faults.  Consequently, we know that testing cannot show that a system works as it should (except for the very simplest systems for which exhaustive testing is possible), and testing cannot guarantee that a system has no more faults
Selection criterion
Since we can't test everything, we need a selection criterion to help us choose the cases we will test.  A criterion C is: 

It has been shown that there can be no algorithm to find a reliable, valid test set for a system (which is too bad, because that's exactly the kind of test set we want). 

In practice, there are two main ways to select test cases: 

In practice, no one method of selecting test cases has proved to be best.  Each method has its strengths and weaknesses.  Even random selection of test cases has been shown to be competitively effective.  The best results are obtained by selecting test cases using more than one method

Test requirement
A test requirement specifies a particular element of a system artifact that must be satisfied or covered by some test case. 
Coverage criterion
A coverage criterion is a selection criterion based on coverage of the code, the design, the specification, or other artifact of the system.  A converage criterion imposes a set of of test requirements.  A test set satisfies a coverage criterion iff each of its test requirements is satisfied by some test case in the set. 
Subsumption (of one criterion by another)
Criterion C subsumes criterion c iff every test set that satisfies C also satisfies c

Levels of testing and the "V" diagram

V-diagram

Figure 4.  The "V" diagram:  levels of testing

Acceptance testing
Testing of the entire system against the stakeholder's requirements.  Compare system testing, which tests that the parts of the system interact as specified by the design.
Alpha testing
Some ordinary users try out the system at the development site
Beta testing
Some ordinary users try out the system at their own site(s)
Integration testing
Testing of two or more parts of the system, testing that their interactions are consistent with the system design; assumes that the parts have already passed their individual tests.  Integration testing may be done at as many levels as is convenient; lower level integration testing is sometimes called component testing, and the highest level of integration testing, integrating the entire system, is sometimes called system testing.
Regression testing
Testing a new system with the tests developed for an older version, to show that the new system had the properties of the older one (behavior or reliability).  Regression test selection is its own specialized and important area.  Since it is often the case that the new system mostly behaves like the previous version did, it is highly advantageous to reuse as many test cases as possible, which means one must identify which ones can be reused.  It also may be desirable, if time is short, to identify\ many of the unchanged test cases as skippable in order to concentrate on the new behavior. 
Unit testing
Individual testing of each of the system's smallest units, often by the developer that created it

Stopping criteria

Regardless of how the test cases are selected, one may choose individual test cases one after another and continue testing until some criterion is met.  The criterion may be:

Testing concurrent systems

The possible states of a group of concurrent systems is very high, due to the large number of possible interleavings of the actions of each system.  In addition, it is usually difficult to set up a test case that will cause a specific interleaving.  It is not uncommon for designers of copncurrent systems to depend more heavily on analysis using model checking and other techniques, rather than depending mainly on testing.  Such issues are much more difficult than those that ordinarily arise for single-process systems. 

In conclusion, some challenges

  1. Putting the system into the necessary initial state in order to run a test case.
  2. Duplicating the necessary interleaving of concurrency for multi-threaded systems.
  3. Deciding how many test cases are enough.
  4. Choosing the best test cases.
  5. Distinguishing correct results from failures.

Sources

Adrion+Branstad+Cherniavsky1982-vvtc  ·
W. Richards Adrion, Martha A. Branstad, and John C. Cherniavsky.  Validation, Verification, and Testing of Computer Software.  ACM Comput. Surv., 14(2):159-192, 1982. 

http://dx.doi.org/10.1145/356876.356879

Amman+Offutt2008-ist
Paul Ammann and Jeff Offutt.  Introduction to Software Testing.  Cambridge University Press, 2008. 
Goodenough+Gerhart1975-tttd-tse
John B. Goodenough and Susan L. Gerhart.  Toward a Theory of Test Data Selection.  IEEE Transactions on Software Engineering, 1(2):156-173, June, 1975. 
Muccini 2002 slides
This handout began from Dr. Henry Muccini's slides for ICS122, 2002 (used with permission). 
Share-Alike Made with jEdit Valid CSS! Valid HTML 4.01! UC Irvine Thomas A. Alspaugh
Assistant Professor, Informatics Dept.
School of Information and Computer Sciences