(Overview) Software Problems and Organizational Processes

Software Quality Assurance and Organizational Processes

(Overview)

Project Memo UIUC-2003-09, 18 March 2003

Prof. Les Gasser
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
gasser@uiuc.edu

Errors, like straws, upon the surface flow;
He who would search for pearls must dive below.

-- John Dryden [All for Love, Prologue.]

Description:

Mistakes, errors, and problems are a common element of working life, and a part of all settings at one time or another. In software production, the work of identifying and resolving errors, bugs and mistakes plays a large part. Operating system idiosyncrasies or seemingly random "glitches" in program runs keep designers from refining programs which are "almost perfect". In the worst cases, buggy software can present major threats to security, economic health, and even lives. In the best case, it is annoying, and time-consuming. People who work with software tools often need to rationalize difficulties to others, repeat work, and invent ways to circumvent problems they face as a result of computing errors. Developers themselves need to define and negotiate what are significant versus insignificant issues, how to allocate limited resources, and how to please multiple clients in many overlapping balancing processes. These efforts sap time, energy, and resources and often even positive sentiment, both in provider-client relationships and internally, in software development teams. As complex software artifacts proliferate and become more central to---even ubiquitous in---peoples' lives, the social cost of software errors and bugs may increase. Since errors and bugs reduce the effectiveness with which we can build and use software systems, we're interested in understanding how bugs come about, how they can best be managed, and how people who build and use advanced software systems can organize their work to prevent, overcome, deal with, and accommodate problems.

Most accounts of software problems focus on flaws in technical design and usability. Surely better design, prototyping, and needs analyses can help. But there's clearly much more to the issue---specifically, the reliability of a software artifact is related to the structure of the technical and organization processes that produce it and to the technical and organizational infrastructures and constraints under which it is built and maintained. This research is probing several aspects of this mix. We're examining questions such as:

What is the detailed character of the practice of quality assurance in software teams? What kind of activities occur? (For example, our previous research has identified activities such as defining/redefining/negotiating the meaning of bugs and their importance; replicating the causes and effects of bugs, attributing causes and responsibilities, etc., and begun to explore the relationships of these activities to social organization and project infrastructure.)

Is there a "normal practice" of software construction and software use? What is the 'normal' role of errors, mistakes, problems, insecurity, and unreliability? How prevalent are problems? Can problems be eliminated in principle?

Why do some software problems persist indefinitely in some contexts, while others get resolved quickly?

How do specific software design methods and techniques help or hinder in managing reliability?

When and how do people creatively redefine and work around problems, errors, mistakes, reliability, security, etc. as strategies of management, control, and social integration?

How do people in teams decide what are critical problems, what problems should and should not be resolved, and how to allocate limited software quality assurance resources?

How do quality assurance processes, decisions, requirements, and constraints evolve through the lifecycle of projects?

How can we capture, measure, visualize, assess (etc.) the relationships between multiple problems, multiple constraints, and multiple actors in a software artifact and its supporting organizational infrastructure?

What do people do when problems can't be resolved? How do social and technical systems adapt in the face of endemic problems?

How are reliability problems arenas for social integration and cohesion? How do varying degrees of social integration and cohesion impact quality assurance processes and software reliability?

How can bug-tracking and support tools best support the organizational and team processes of software quality assurance? Where do they fail? How can those tools be viewed, analyzed, and optimized as critical information infrastructures for software development teams and organizations?

How are these issues conditioned by the special features of the distributed work and organizational contexts specific to open source software projects?

Role of Open Source Bug Repository Data

Our preliminary work on these issues has been done with small to moderate-sized datasets of qualitative data in the form of structured interviews with software developers and users. These have helped refine our idea of some of the critical problems, and have helped build some preliminary insights. But it's difficult to analyze issues such as the ones above, without comprehensive, time- and project-specific data from large projects. With over severl hundred thousand problem reports from a variety of open-source projects, widely-accessible open source bug repositories provides an extremely large and diverse dataset for analyzing issues like those above. We've reviewed a number of reported bugs via individual searches and downloads, from repositories including Gnome, Debian Gnu/Linux, OpenOffice.org, and others. These repositories appear to have the following characteristics that make them ideally suited as datasets for investigating such issues:

Significant number of bug reports: There is enough opportunity for triangulating preliminary hypotheses, for searching for patterns over time and type and so on. There is likely to be significant variance in bug types, report types, and responses to learn something interesting.

Longitudinal, life-cycle data: Data has been captured over a relatively long period--several years, allowing for analysis of processes and their evolution over time.

Both qualitative and quantitative data: These repositories can be analyzed quantitatively (in terms of numbers of events, event types, response types, dates, timelines, etc.) It can also be analyzed qualitatively, by comparatively examining the texts of bug reports, responses, analyses, etc.

Structured data: The data in these repositories is already captured in structured form, that allows for more systematic analysis and for more automated, quantitative analysis, using e.g. data mining, statistical analysis, and network analysis techniques.

Software Quality Assurance and Organizational Processes

(Overview)

Project Memo UIUC-2003-09, 18 March 2003

Prof. Les Gasser Graduate School of Library and Information Science University of Illinois at Urbana-Champaign gasser@uiuc.edu

Prof. Les Gasser
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
gasser@uiuc.edu