Professional Documents
Culture Documents
Applying
Quantitative
Methods to
Software
Maintenance
ED WELLER
Bull HN Information Systems, Inc.
INTRODUCTION
A major release to Bull Information Systems GCOS 8
Operating System created the need to understand in detail the
operation of its software maintenance team (SMT), the group
responsible for software maintenance (maintenance in this discussion refers to corrective maintenance, that is, fixing defects
reported by customers). By looking at the SMT process data,
the company wanted to find what was working well and duplicate those processes, as well as identify less effective processes
and eliminate them. It also wanted to measure the typical
process metrics to allow year-to-year comparisons to evaluate
the impact of process changes, as well as to evaluate the performance of the team against perceptions of its performance.
This article concentrates on the metrics used by the SMT and
shows how quantitative methods can be used to evaluate maintenance work. The evaluations cover three years of data.
Several of the process changes introduced as a result of the
studies are also covered.
DATA CHARACTERISTICS
The GCOS 8 Operating System (exclusive of third-party products) includes about 7 million lines of source code split across
about 320 product IDs (PIDs). A major release in 1995 resulted
in an expected flow of customer reported defects. As these
defects were repaired, the company monitored a number of the
process metrics to evaluate the effectiveness of the repair
process. The GCOS 8 Operating System is divided into logical
22 SQP VOL. 3, NO. 1/ 2000, ASQ
SMT PROCESS
The SMT was set up as a separate organization to
allow a dedicated team of maintenance specialists to
focus on customer reported problems. Key elements of
the process include daily reviews of open problems,
pairing specialists on problems that have not been
resolved within goals (or at the request of the person
working the problem), and critical problem reviews
(CPR) for especially difficult or longstanding problems. Over several years, the productivity of the SMT,
measured by the effort per system technical action
request (STAR), more than doubled. Metrics are collected using an internally developed problem tracking
system called problem analysis and solution system
(PASS). This system recorded the usual data elements
seen in todays problem tracking systems. Some of the
58 fields include date opened/closed, customer, priority, assigned to, closure code, and PID. This system
Once these metrics are selected as the quality measure of the work, as seen by Bulls customers, what
questions might one ask about them that will reveal
something (useful) about the SMT process? This article looks at the following:
Is there a relationship between the volume of
STARs and response time?
Is there a relationship between the PID in
which the defect is repaired and response
time?
Is there a relationship between the volume of
STARs or PID and quality?
Does inspection effectiveness affect quality?
Does response time affect quality?
Answers to these questions should enable the company to answer a number of process related questions:
Is the company applying its resources
optimally?
Would a detailed look at the data show obvious areas for process improvement?
Does the volume of change or length of
response time affect quality in any way?
(There was a perception in some areas that
faster response time was accomplished by
rushing the fixes and tacitly accepting lower
quality.)
Introducing inspections in 1990 to 1993 had
reduced the recidivism ratio to one-third of the preinspection values (Weller 1993). Over the four years the
company gained another 33 percent improvement but
had stabilized at a 4 percent ratio. Bull was looking to
better understand the underlying processes to see
what could be changed to improve further.
ANALYSIS TECHNIQUES
The data available for analysis were limited by what
was collected in the problem reporting system, the
site inspection database, and the EP database
(emergency patch, while historically a meaningful
definition, now means an object code fix shipped
between releases) kept by the change control board
that manages EPs. The company did not have the
luxury of adding additional data to the problem
24 SQP VOL. 3, NO. 1/ 2000, ASQ
Useful data
Trend lines
Scatter plots
Best case/Worst case
2000, ASQ
5
2000, ASQ
0
-5
-10
E F G
Product ID
Lessons Learned
Inspections work (this is a no-brainer, but at
the time of the grandfathering a decision not
to inspect completed work was clearly wrong.
There is always time to do it right.)
The company needed to define exit criteria for
turnover from development to maintenance.
Lessons Learned
Evaluate source roll (source-code cleanup)
frequency to ensure the proper balance
between the cost of the source roll and
increased maintenance effort.
Improve product familiarity (another nobrainer; however, if the work force is optimally allocated, moving people around will not
work, so product training may be the only way
to achieve this end. Ultimately, this may mean
on-the-job training or experience.)
DEVELOPMENT IMPACT ON
MAINTENANCE
Figure 3 shows a plot of the recidivism ratio by PID.
Again, PID A is clearly out of the box, with a high
response time and lower quality. The analysis of the
problems with this PID indicate the design was not
documented and that it was not inspected. Without
design documentation, the only way the team could
decipher the design was by reverse engineering the
code. The high defect ratio that was visible meant the
Lessons Learned
5%
4%
2000, ASQ
Recidivism ratio
2%
1%
F
G
Product ID
15%
10%
2000, ASQ
Defect ratio
5%
0%
10
20
30
Days
20
40
60
80
100
Number of STARs inspected
120
140
SUMMARY:YEAR 1
10%
Recidivism ratio
2000, ASQ
r = .63
r = .68
8%
6%
4%
2000, ASQ
A Different View
2%
0%
0%
5%
10%
15%
20%
25%
30%
35%
8%
4%
0%
A
Def Ratios
Mean
2000, ASQ
9 13 17 21 25 29 33 37 41 45
Months in operation
Previous
Current
CurrentNew
The lessons learned from the second year of data reinforce the first year of data, that low quality products
are difficult to maintain and will have a high(er)
defect ratio in repair than well-designed products.
This is not an earth-shattering revelation, but sometimes one must have these data to prove the pay me
now or pay me later rule holds for software just as
well as for oil filters and auto engines. There were several significant changes in the processes (or enforcement of processes) in the following years.
First, the company realized the existing maintenance processes were not capable of better performance and instituted a major change in the testing
process. EPs that were distributed beyond the sitespecific correction were subjected to longer testing.
This delayed the availability of the correction, but is
in agreement with the recognized defect to failure
ratio relationship published by Adams (1984).
Second, the company reduced the frequency of EPs.
By taking a more conservative approach to distributing corrections (again consistent with the Adams
work, especially in the second or third year of corrective maintenance), it avoided taking site-specific corrections to a second site where they were not needed.
This effectively reduced the recidivism ratio to zero.
(Since the middle of 1997, eight or more months of
the year see 0 percent recidivism ratio). The change
was so significant that the company was not able to
repeat the analysis in the third year. The number of
defective fixes was so low that it was not possible (or
necessary) to use the same analysis methods.
Second, the company strictly controlled the development processes on the next release. It introduced
defect-depletion tracking across the full development
life cycle. It developed quality goals for the next release,
and predicted and measured conformance to these
goals (Weller 2000). The stated objective of development was to put the SMT out of business by reducing
http://sqp.asq.org 27
THE PAYBACK
One measure of improvement from release to release
is to measure the reported defects against the systemyears of operation. This normalizes for different
migration rates and number of systems. Figure 8
shows the comparison between the release that generated the data in this article with its follow-on release.
The follow-on data were subdivided to show the contribution of the new products in the follow-on release.
The follow-on release was as stable as its mature predecessor 12 months after initial exposure because the
defect ratio of the new product was significantly better
than its predecessor. The current ratio is about 40
times better, but actual exposure of the post-Y2K
release is still limited. The company expects the final
result to be in the 10-to-20-to-1 range. Note that all of
these gains were not due to the results of this study,
but several key changes, including the absolute need
for design and inspection, were rigorously enforced on
all products going into the follow-on release.
SUMMARY
What did Bull Information Systems learn? Obviously,
the first lesson is do it right the first time. Looking
strictly at the analysis of the SMT process, the company concluded the existing defect removal processes
were delivering the best the company could expect,
and that any improvements would require major
changes to the process.
ACKNOWLEDGMENTS
BIOGRAPHY
Without the detailed EP data collected by Jim Hensley, this study would
not have been possible. I would also like to thank the many valuable
comments from the reviewers.
Much of this work was presented in 1996 and 1997 at the Application of
Software Measurement Conferences. The material was presented here
with the benefit of four to five years of hindsight, and the results of
some of the improvements made since then.
REFERENCES
Adams, E. 1984. Optimizing preventive service of software products. IBM
Journal of Research and Development 28, no. 1.
Bencher, D. L. 1994. Programming quality improvement in IBM. IBM
Systems Journal 33, no. 1: 218-219.
Jones, C. 1991. Applied software measurement. New York: McGraw-Hill.
Weinberg, G. 1986. Kill that code. IEEE Tutorial on Software
Restructuring. Washington, D. C.: IEEE Computer Society Press.
Weller received the IEEE Software Best Article of the Year award for his
September 1993 article, Lessons From Three Years of Inspection Data,
and was awarded Best Track Presentation at the 1994 Applications of
Software Measurement Conference for Using Metrics to Manage
Software Projects. Weller has more than 30 years of experience in hardware, test, software, and systems engineering of large-scale hardware
and software projects and is a Senior member of IEEE. He has a bachelors
degree in electrical engineering from the University of Michigan and a
masters degree from the Florida Institute of Technology. Weller can be
reached at Bull Information Systems, Inc., 13430 N. Black Canyon, MS Z68, Phoenix, AZ 85029 or by e-mail at Ed.Weller@Bull.com .
and the exempt status for Federal income tax purposes: has not changed during the preceding 12
months
13. Publication Title: Software Quality Professional
14. Issue Date for Circulation Data Below: September
2000
15. Extent and nature
Average no. Actual no.
of circulation
of copies of of copies of
each issue
single issue
during
published
preceding
nearest to
12 months
filing date
A. Total No. Copies Printed (Net Press Run)
4196
4175
B. Paid Circulation
1. Paid/Requested Outside-County Mail
Subscriptions Stated on Form 3541
2861
3130
2. Paid In-County Subscriptions 0
0
3. Sales through dealers and carriers, street
vendors, counter sales, and other non-USPS paid
distribution
0
0
4. Other Classes Mailed Through the USPS
358
400
http://sqp.asq.org 29