Software Quality Assurance and Organizational Processes
(Overview)
Project Memo UIUC-2003-09, 18 March 2003
Prof. Les Gasser
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
gasser@uiuc.edu
Errors, like straws, upon the surface flow;
He who would search for pearls must dive below.
-- John Dryden [All for Love, Prologue.]
Description:
Mistakes, errors, and problems are a common element of working
life, and a part of all settings at one time or another. In software
production, the work of identifying and resolving errors, bugs and
mistakes plays a large part. Operating system idiosyncrasies or
seemingly random "glitches" in program runs keep designers from
refining programs which are "almost perfect". In the worst cases,
buggy software can present major threats to security, economic health,
and even lives. In the best case, it is annoying, and
time-consuming. People who work with software tools often need to
rationalize difficulties to others, repeat work, and invent ways to
circumvent problems they face as a result of computing
errors. Developers themselves need to define and negotiate what are
significant versus insignificant issues, how to allocate limited
resources, and how to please multiple clients in many overlapping
balancing processes. These efforts sap time, energy, and resources and
often even positive sentiment, both in provider-client relationships
and internally, in software development teams. As complex software
artifacts proliferate and become more central to---even ubiquitous
in---peoples' lives, the social cost of software errors and bugs may
increase. Since errors and bugs reduce the effectiveness with which
we can build and use software systems, we're interested in
understanding how bugs come about, how they can best be managed, and
how people who build and use advanced software systems can organize
their work to prevent, overcome, deal with, and accommodate problems.
Most accounts of software problems focus on flaws in technical design
and usability. Surely better design, prototyping, and needs analyses
can help. But there's clearly much more to the issue---specifically,
the reliability of a software artifact is related to the structure of
the technical and organization processes that produce it and to the
technical and organizational infrastructures and constraints under
which it is built and maintained. This research is probing several
aspects of this mix. We're examining questions such as:
- What is the detailed character of the practice of quality
assurance in software teams? What kind of activities occur? (For
example, our previous research has identified activities such as
defining/redefining/negotiating the meaning of bugs and their
importance; replicating the causes and effects of bugs, attributing
causes and responsibilities, etc., and begun to explore the
relationships of these activities to social organization and project
infrastructure.)
- Is there a "normal practice" of software construction and software
use? What is the 'normal' role of errors, mistakes, problems,
insecurity, and unreliability? How prevalent are problems? Can
problems be eliminated in principle?
- Why do some software problems persist indefinitely in some
contexts, while others get resolved quickly?
- How do specific software design methods and techniques help or
hinder in managing reliability?
- When and how do people creatively redefine and work around
problems, errors, mistakes, reliability, security, etc. as strategies
of management, control, and social integration?
- How do people in teams decide what are critical problems, what
problems should and should not be resolved, and how to allocate
limited software quality assurance resources?
- How do quality assurance processes, decisions, requirements, and
constraints evolve through the lifecycle of projects?
- How can we capture, measure, visualize, assess (etc.) the
relationships between multiple problems, multiple constraints, and
multiple actors in a software artifact and its supporting
organizational infrastructure?
- What do people do when problems can't be resolved? How do social
and technical systems adapt in the face of endemic problems?
- How are reliability problems arenas for social integration and
cohesion? How do varying degrees of social integration and cohesion
impact quality assurance processes and software reliability?
- How can bug-tracking and support tools best support the
organizational and team processes of software quality assurance? Where
do they fail? How can those tools be viewed, analyzed, and optimized
as critical information infrastructures for software development teams
and organizations?
- How are these issues conditioned by the special features of the
distributed work and organizational contexts specific to open source
software projects?
Role of Open Source Bug Repository Data
Our preliminary work on these issues has been done with small to
moderate-sized datasets of qualitative data in the form of structured
interviews with software developers and users. These have helped
refine our idea of some of the critical problems, and have helped
build some preliminary insights. But it's difficult to analyze issues
such as the ones above, without comprehensive, time- and
project-specific data from large projects. With over severl hundred
thousand problem reports from a variety of open-source projects,
widely-accessible open source bug repositories provides an extremely
large and diverse dataset for analyzing issues like those above. We've
reviewed a number of reported bugs via individual searches and
downloads, from repositories including Gnome, Debian Gnu/Linux,
OpenOffice.org, and others. These repositories appear to have the
following characteristics that make them ideally suited as datasets
for investigating such issues:
- Significant number of bug reports: There is enough
opportunity for triangulating preliminary hypotheses, for searching
for patterns over time and type and so on. There is likely to be
significant variance in bug types, report types, and responses to
learn something interesting.
- Longitudinal, life-cycle data: Data has been captured over a
relatively long period--several years, allowing for analysis of
processes and their evolution over time.
- Both qualitative and quantitative data: These repositories
can be analyzed quantitatively (in terms of numbers of events, event
types, response types, dates, timelines, etc.) It can also be analyzed
qualitatively, by comparatively examining the texts of bug reports,
responses, analyses, etc.
- Structured data: The data in these repositories is already
captured in structured form, that allows for more systematic analysis
and for more automated, quantitative analysis, using e.g. data mining,
statistical analysis, and network analysis techniques.