Sutherland Pracitcum Meltdown Final

PRACTICUM: A TALE OF T HREE PROCESSES: REFLECTION ON S OFTWARE DEVELOPMENT PROCESS CHANGE AT TARTAN
A Tale of Three Processes: Reflection on Software

Development Process Change at Tartan
Dean F. Sutherland
A large software product family under continuous maintenance and evolution

over a multi-decade lifespan can outgrow the organization’s software process.
When this happens, it is important to recognize the problem and to make
appropriate changes to the development process. This report is an account drawn
from the author's experience in an organization whose product outgrew its
development process, made its problems worse through inappropriate process
changes, then finally achieved success through appropriate process changes.
1. Introduction
It is often difficult to know when to stop making incremental improvements in a software
development and maintenance process, and instead to make a bold shift. This case study
examines one concrete example, a particularly difficult year in the history of Tartan Inc., during
which I served as a senior engineer. Over the course of that year, we saw—and failed to see—a
variety of warning signs, reached several decision points, and eventually made a bold process
shift.
The following section describes Tartan’s market constraints, code base, and the software process
that had been in use for over a decade. The next section examines specific issues based on this
author’s experience.
One of the lessons of this experience is that more process is not necessarily better, and that
inappropriate process often exacerbates existing problems. Another lesson of this experience is
that software development processes really do matter, and that failure to keep software process
appropriate to the problem at hand can cause great problems. My point, in other words, is that
simple and lightweight process improvements can provide substantial benefit at reasonable cost.
1.1. Tartan, Inc.

In 1992 Tartan was an independent provider of software development tools—mostly Ada
compilers and associated tools—for the embedded systems market. It was founded in 1981 as a
technology spin-off from Carnegie Mellon University (CMU). Tartan’s gross income was
approximately $8M per year; the company was modestly profitable. The company retained its
extremely lean startup mentality well into the nineties. The general culture can be summed up as
“Do more with less—and do it fabulously well.”
Tartan’s Business Model

Tartan used market-leading performance and customer support as the key distinguishing features
of its products. Each compilation system produced the fastest, smallest code for any language on
each supported target system—routinely outperforming both C and Fortran compilers available
for the same target systems. Tartan also provided the fastest, smallest and most full-featured Ada
Page 1 of 20
runtime system available for each target system. The Tartan focus on performance was critical to
market success for the company. Customers, including potential customers, routinely compared
the performance characteristics of the code generated by Tartan compilers to very carefully tuned
hand-written assembly code. Tartan’s emphasis on performance meant that these comparisons
were generally quite favorable. See [12] and the “Ada Outperforms Assembly” sidebar for a
notable example.
The second core focus of Tartan’s
Ada Outperforms Assembly
business model was customer support
and interaction. Product developers With the intent of getting an Ada waiver, a defense
were made available as needed to contractor assigned a junior programmer to re-implement
address customer-reported issues. a portion of its software in Ada to prove that Ada could
Customers could expect to get bug not produce real-time code. The expectation was that the
workarounds within one or two resultant machine code would be too large and too slow
business days. If the bug was a to be effective for a communications application. Two
“show-stopper,” they would receive days and sixteen labor hours later, the opposite was
shipment of a new tool incorporating verified. With only minor source code variations, one
their bug-fix within a week or two. version of the compiled Ada code was 3x smaller than the
Less crucial fixes were typically original assembly version and 1.01x slower. A slightly
deferred to the next bi-annual tool modified version of the Ada was 1.92x faster than the
release. assembly version and 1.01x larger.
Tartan’s customers expected a major -- See "Ada Outperforms Assembly: A Case Study." [8]
feature upgrade on a yearly basis,
along with either one or two additional bug-fix upgrades each year. Many of these customers
produced safety-critical systems. As a result, they cared deeply about the quality and reliability
of Tartan’s products. Supported customers paid a maintenance fee of 15% of purchase price per
year for upgrades and support. Revenue from support contracts made up 25% of Tartan’s gross
income1. Therefore, keeping supported customers happy was crucial for corporate survival.
Characteristics of the code base

Tartan began with a base of approximately 100kSLOC of BLISS-36 code developed at CMU
during the mid- to late 1970s. This code base was modified, enhanced, translated, and recoded
continuously by Tartan. Essentially none of the original code remained by 1992. Nevertheless,
the intellectual heritage of many modules was quite clear—some had change log entries going
back to 1974.
By 1992, we had about 1 MSLOC, of which 750 kSLOC were in the compiler. The compiler
source code was all written in Ada-83. The compiler team had 19 developers to handle all
maintenance, new feature development, debugging, and new products. Because Tartan’s
customers required major feature upgrades on a yearly basis, new feature development consumed
more than 50% of available manpower. The remainder was spent on bug fixes, debugging
customer problems, general maintenance, and new product development (e.g., support for new
target architectures).
1
This percentage was growing steadily, but still fell below the industry average of 30%, due to year-to-year growth
in Tartan’s sales.
Page 2 of 20
Software Process
Prior to the emergence of the problems examined in this paper, Tartan had an informal tool-
focused software process. Automated source code control was used without exception. All
changes to the code were accompanied by informative log entries. A bug database tracked
problems uncovered in testing by Tartan or in the field by customers. While most developers did
unit and integration testing, some did not. The entire process was officially ad hoc, with the
details left to each developer's judgment. In 1992, this informal and lightweight process had been
successful for over a decade.
One reason for the success of Tartan’s software process was extremely low turnover of the
technical staff. The compiler group lost an average of one employee per year. Only half of these
losses involved employees leaving the company; the rest were internal transfers to other parts of
development team. Furthermore, there was essentially no loss of senior engineers, only internal
transfers. This unusually low turnover led to a corporate memory that was both long in duration
and broad in scope of knowledge.
The most formal part of our process was the “known bug list.” This list contained complete
descriptions of the cause and symptoms of each bug, along with a small test case that
demonstrated the bug. Bugs were further categorized by product, host system, target architecture
and development board, and, where relevant, customer. Each bug was given a priority based on
its perceived seriousness, the presence or lack of a reasonable work-around, and its expected
degree of impact on Tartan’s reputation. Priorities were set by a small group that included
representatives from Engineering, Sales & Marketing, and Customer Support.
Organization
The relevant groups within the company for the purposes of this study were Senior
Management—whose members had backgrounds in sales, marketing, and finance—and
Engineering. Engineering consisted of the Vice President of Engineering, the senior engineers,
and the rest of the developers. It is significant to note that the VP of Engineering was not
considered to be part of “Senior Management.”
1.2. Overview of Events

Tartan’s compiler group experienced a near-total software development meltdown lasting from
the summer of 1992 until the summer of 1993. During this period we were unable to ship new
or upgraded products. The most recent major feature upgrade had been in fall 1991; the last
general bug fix release had been in the first quarter of 1992. Customer dissatisfaction with the
lack of upgrades and critical bug fixes led to a huge decrease in sales of support contracts and a
modest decrease in product sales. The resulting decrease in revenue nearly put the company out
of business.
At the beginning of June 1992, Tartan’s compiler team was using a software development
process that had worked well when we had 100 kSLOC in a context where we had a much
larger development team and 750 kSLOC. We saw a marked increase in missed schedules and a
significant increase in bug reports from the field, but failed to relate these symptoms to any
particular underlying cause. As often happens, we responded to the problems by working harder
at what had worked in the past.
Tartan’s Senior Management recognized that there was a serious problem in software
development, but they were unable to make Engineering take action. In response, Senior
Page 3 of 20
Management decided to fill the engineering leadership gap. They read enough about software
engineering to recognize that we had a process problem, but did not have enough experience to
develop an effective solution. Instead, they mandated that we engage in a comprehensive testing
activity to “find all the bugs.” This well-meaning change was counterproductive. The result was
a tidal wave of newly identified bugs, along with resentment and bad morale in Engineering.
The volume of newly identified bugs was so large that nearly all of Engineering’s effort for a
six-month period was spent on test running, test automation, and failure analysis, leaving little
effort available to actually fix bugs. As a result we made little useful progress toward product
shipment for six months.
At this point, the senior engineers—including the author—finally concluded that an improved
software development process was the only solution that might enable us to ship new products
before the company went out of business. We designed a new process, paying careful attention
to Tartan’s situation and problems as well as to adoptability issues and technical issues. The
new process, about which much more later, focused on identifying and correcting bugs as early
as possible in the development process. After substantial difficulty with adoption, the new
process proved sufficient and Tartan shipped a new and more reliable product.
The remainder of this study consists of narrative interspersed with analysis. The narrative
sections describe events at Tartan as they unfolded, using the viewpoint we had at the time. Each
analysis section examines a decision point. For each, I examine the issue, the available
information, the decision, and the information used to make it. I then consider the outcome of the
decision, the useful information that could have been available if we had thought to look for it,
and suggest some alternative actions we could have chosen given the knowledge of the time but
with the wisdom of the present.
Page 4 of 20
1.3. Software Process Status Quo Ante
Because of the extremely diverse

collection of target processor The Vendor-H i960 Implementation
architectures, Tartan’s commitment to
Vendor-H2 built a custom system using the Intel i960
performance meant that we could not
microprocessor. Their implementation could perform
simply design our compilers around a
memory loads and stores only from addresses having
straightforward abstract machine
“native alignment.”3 The i960 architecture specification
model, and then use that model for
requires that memory loads and stores work correctly
optimization and code generation
from any address, so this system required non-standard
everywhere. An abstract machine
support. Vendor-H paid Tartan to modify its i960-
allows maximum sharing of
targeted compiler on a “cost-plus-percentage” basis.
development effort across targets, but
makes it more difficult to take The Vendor-H hardware was considered sensitive in a
advantage of the special features of National Security sense so Tartan was not permitted to
each specific target architecture. have target hardware on site. Instead, twice a year,
Tartan’s engineers made a continual Vendor-H representatives came to Tartan’s facility
effort to maximize the level of bringing one of their target boards so that we could
abstraction in our target descriptions. debug problems with their special compiler version
Nevertheless, the diversity of targets before shipment. The Vendor-H hardware was
forced a large number of special considered sensitive enough that they sent both an
cases. See, for example, the sidebar engineer and a security officer with the board, at least
on the “Vendor-H i960 one of whom was handcuffed to the board at all times.
implementation.” Needless to say, biannual access to target hardware made
testing difficult and “exciting.”
Tartan’s commitment to excellent
customer support required the
Engineering team to be able to produce a “bug-fix” version of the latest compiler in customer
hands within a very few days. This, in turn, put a premium on ease of debugging and
modification. It should be clear that absolute maximum performance, quick turn-around for bug
fixes, high reliability, and low-cost development is a combination of attributes that is nearly
impossible to achieve. Before Summer 1992, we delivered on performance and quick bug fixes,
but were seeing signs that reliability was slipping. We had never been able to deliver low-cost
development along with the other three.
Every software development organization experiences cascading errors, where fixing one
problem breaks something else. This is often known as “brittleness” of the software [9]. Indeed,
Tartan had encountered this issue before the time period described in this report. Typical project
estimates for product enhancements or major bug fixes included 30%–50% of the total project
effort allocated as a buffer for exactly this kind of surprise. Including this allocation for
surprises, 90% of projects were completed within their planned schedule. There were essentially
no cascading errors found after the project was completed. The amount of time reserved for
2
Vendor-H was a large defense contractor.
3
Byte-sized loads and stores were allowed at any address, 2-byte load/store operations were allowed only at even
addresses, 4-byte at (address mod 4)=0 addresses, etc.
Page 5 of 20
surprises was necessarily large because Tartan’s performance commitment required the
engineers to handle a very large number of special cases for various target hardware. These
special cases made it much harder to correctly anticipate all the implications of any specific
change.
2. Issues
2.1. Process and Scale
2.1.1. Narrative
As noted previously, Tartan had succeeded for the prior decade with an ad hoc development
process. By the beginning of 1992, however, customers were complaining about reliability.
Users complained that products contained too many bugs. Several large customers were so
dissatisfied that they did not renew their maintenance contracts. At the same time, the
engineering staff was struggling to meet development and maintenance milestones—and failing.
Everything in development seemed slow, but no one had a clear idea of the reason.
Tartan’s Senior Management recognized these warning signs as indications of a serious problem
in Engineering. They correctly categorized it as a process problem. The engineering team, up to
and including the VP of Engineering, was completely unreceptive to this diagnosis. We
recognized the individual problems, but discounted Senior Management’s diagnosis. “What do
those guys know? They’re not technical, and our process has worked fine for over a decade!”
These problems were compounded by the need to produce regular feature upgrades. In early
1992, we had committed to a group of aggressive feature upgrades. These new features were
important both to satisfy market demand and to meet contractual requirements. Implementing
them required examining 80% of the compiler source code, changing 10% of it, and introducing
about 50 kSLOC of new code.
2.1.2. Growth of the Compiler

Between 1987 and 1992 Tartan’s compiler group experienced significant growth in a wide
variety of areas.
Growth in Staff. From the beginning of ’87 to the beginning of ’92 the Back End group grew
from five engineers to twelve (three working on the Optimizer, nine on the target code
generators), the Middle Pass group from two engineers to four, and the Front End group grew
from two engineers to three. Average experience with the Tartan software being maintained
remained at about five years. Average total career experience remained at about seven years.
There was essentially no personal turnover during the period.
Growth of the Code base. In 1988 Tartan began development of its fourth target code
generator: the Intel i960 family. Development for the TI c3x family of Digital Signal Processors
began in 1990 with the c3x. The c4x was added to the TI DSP family in 1991. These two target
families increased Tartan’s code base by about 175 kSLOC.
Growth of special cases. Through careful engineering, each target architecture family shared
nearly all of the source code in its target code generator. Nevertheless, most target families
included several variants, each with its own collection of special issues. By the end of 1991, the
Page 6 of 20
TI c3x/c4x family covered four variants; the i960 family covered three. Furthermore, between
1988 and the end of 1991, Tartan had added support for four additional processor variants to the
Motorola 68K family (bringing its total to fourteen), and three additional variants to the Mil Std.
1750a family (bringing its total to seven). The burden of special cases increases both with the
number of different target architecture families and with the number of variants within each
individual family—and Tartan’s goal of industry-leading performance meant that these special
cases must be embraced, not ignored. The compiler Front End, Middle Pass, and Optimizer were
not immune to the problem of special cases either. These three components were heavily
parameterized so that they could remain essentially identical for all targets. However, the
profusion of special cases led to a steady stream of changes and improvements to that
parameterization. At the beginning of 1987, Tartan supported three target architecture families,
with fifteen distinct variants. By the beginning of 1992 Tartan supported five target architecture
families, with a grand total of twenty-nine distinct variants—a near doubling of targets.
Growth of surrounding tooling. In 1988 Tartan first fielded its own linker/loader. In 1989
Tartan shipped a new Debugger. In 1990 Tartan added a cross-reference tool. In 1991 Tartan
added two performance-profiling tools. Each of these tools required additional support from the
compiler engineers.
Growth of features and optimizations. In addition to the new tools listed above, each major
upgrade included new optimizations and code generation improvements. New compiler features
such as assembly code inserts and “intrinsic functions” also appeared during this period. The key
observation is that each upgrade added new possibilities for unanticipated interactions between
components.
Aggressive New Features. In early 1992, Tartan had committed to a group of aggressive feature
upgrades. Some notable examples were full debug support for optimized code, support for
complete traceability from source code to object code (even in the face of heavy optimization),
and a collection of new language features (as part of an “Ada-9x User-Implementer study).
These features were important both to satisfy market demand and to meet contractual
requirements. Implementing the upgrades required examining 80% of the compiler source code,
changing 10% of it, and introducing about 50 kSLOC of new code. One key aspect of these new
features is that they required changes to some of the oldest, ugliest, buggiest, and least
understood parts of the compilers.
2.1.3. Implications of Growth

Tartan’s existing software process depended on the senior engineers’ ability to recognize and
plan for the implications of each significant change. Getting this right meant that problems were
recognized and dealt with during the design and implementation phases of a project. Getting it
wrong let problems slide into integration and testing. A standard rule-of-thumb in software
development holds that fixing a problem later in development (rather than earlier) is much more
expensive in effort, schedule impact, and dollars. Estimates of this increase range from a factor
of two up to a factor of ten for each development phase that the problem slips through [2] [4].
Thus a problem found in integration costs up to ten times as much to fix as one found in
implementation.
Some combination of the various forms of growth discussed above caused the total scope of the
problem to exceed the senior engineer’s ability to identify implications during planning/design
Page 7 of 20
and development. As a result, we saw a significant increase of problems being found during
integration, or even later testing. Worst of all, there was a noticeable increase in problems that
were found by customers—the most expensive place of all! This shift of two-to-three phases in
problem discovery represents an increase of perhaps 8x (and possibly up to 1000x) in the cost to
fix the problems. With such a large increase in development costs, it is not surprising that the
Engineering staff was no longer able to meet milestones or to keep up with development
schedules. It is also not surprising that Senior Management focused on bug finding. Bug finding
is, after all, easy to measure, easy to understand, and can be addressed with straight-forward
changes to development process.
There were signs of these growth-related problems as early as 1990 and 1991. For example, the
typical project plan’s reserve for surprises increased from 25% of the total schedule in the late
‘80s, to 50% of the total schedule by 1991. Careful attention to this early indicator could have
identified the problem much sooner.
Tartan’s engineering staff saw many of the individual problems and warning signs. Nevertheless,
we failed to recognize that they all stemmed from a single underlying problem: Tartan’s
previously successful ad hoc software development process was no longer adequate for the size
and scope of our products.
We had indications that we were encountering process problems, had we been looking for them:
Increased customer complaints. During the year prior to summer 1992, we saw a large
increase in customer complains about reliability. This was reflected in customer bug
reports, in word-of-mouth reports from the sales and marketing staff, and—most
worrisome of all—in a sharp decrease in maintenance contract renewals.
“Everything seems hard.” The engineering staff began complaining that nearly every
task was seeming more difficult and taking longer than had similar tasks in the past. In
hindsight, this appears to have been due to the rising size and complexity of our code
base exceeding our ability to cope with it. At the time, we had no explanation.
Missed Milestones. We saw a large increase in missed milestones, but had no obvious
explanation. These milestones were both the large externally visible ones, like
implementing a new feature, and the fine-grained internal milestones, such as an
individual engineer’s goal to “get the following four things done over the next two
weeks.” The tasks involved were well within the range of our previous experience. We
had not suffered significant turnover of engineering staff. We weren’t trying to take on a
drastically larger set of problems without increasing staffing.
Tartan’s engineering staff and management saw these signs, and recognized them as problems.
We did not, however, correctly identify them as having a single underlying cause, and we did not
take any effective action to remedy the root problem. Instead, we worked harder—an ineffective
solution.
Senior management, although non-technical, was paying close attention to the few metrics that
were available from Engineering, e.g., missed milestones, as well as to customer complaints, bug
report rates, maintenance renewals, and other non-engineering metrics. This data, along with
their greater distance from the details of engineering, made it easier for them to realize that we
were facing a major problem. In hindsight, the VP of Engineering should have been attentive to
this data and come to the same realization before Senior Management did.
Page 8 of 20
One reasonable question to ask is “Why didn’t the engineers believe Senior Management’s
diagnosis of the problem?” In my opinion, the answer is twofold: we were distracted by the
struggle to get our day-to-day work done and meet the looming deadlines, and we were suffering
from a severe case of “not invented here.”
Engineering was happy with the process we had. We were aware of studies showing that process
must change as the code base grows or the task at hand becomes more complex. However, we
were complacent. Although our code base had grown by nearly an order of magnitude, our
process had been able to handle it for more than a decade. Therefore, we did not consider process
to be part of the problem. Our complacency hampered our ability to recognize a process problem
when we encountered one.
2.2. Non-technical management should mandate results, not detailed process

changes.
Engineering’s failure to act left a leadership vacuum, which non-technical management filled.
Senior management mandated a particular process change, which in our case made the situation
worse. A more appropriate and positive decision would have been to strongly encourage the
Engineering leadership to develop a new process.
2.2.1. Narrative
Tartan’s Senior Management correctly recognized that the engineering development process
needed to be revised. Because of Engineering’s refusal to recognize or address the problem,
Senior Management “took charge.” In an attempt to improve their understanding of the
Engineering side of the company, they read many articles from the business press on the subject
of software process. In the absence of useful input from Engineering, they mandated a solution:
comprehensive testing. Senior Management also decided to delay any further product shipment
until we returned to our previous level of quality and reliability. Catchphrases of the day
included “Find all the bugs, then fix them” and “We shall ship no compiler before its time.”
Engineering fought the introduction of this process change. Those of us in Engineering sensed
that simply adding additional testing effort would not solve our problems. However,
Engineering had lost much credibility by being unresponsive to Senior Management’s
suggestions, and by failing to correct our own problems. We thus lost the argument against
increased testing. Once the decision to start comprehensive testing was made, the engineering
organization worked quite hard to try to make it work, albeit with a fair amount of grumbling
along the way.
Tartan had a large and comprehensive test suite, comprising 2 MLOC spread across
approximately 5,000 executable programs. It included government-mandated language tests,
benchmarks and stress-test programs from customers, regression tests intended to exercise every
prior bug-fix, and purpose written tests intended to exercise various compiler features.
The testing effort lagged in the beginning due to insufficient hardware resources (both host
workstations and target boards) and due to weaknesses in our test automation. These problems
were alleviated by spending significant amounts of money on additional hardware, by
substantial improvements to our test automation, and by creating a smaller “overnight” test suite
for frequent use by front-line developers.
Making a serious effort to run the entire test suite on all supported combinations of host
computer, target architecture, and development board found so many test failures that
Page 9 of 20
Engineering couldn’t keep up with analyzing them, much less actually fixing any of the bugs
that caused the failures. During the second half of 1992, we struggled with this tidal wave of
test failures. Our list of reported test failures awaiting analysis peaked at over 13,000 items.
Although we eventually managed to diagnose failures as quickly as they were discovered, we
made little progress on fixing the underlying software. Meanwhile, the list of known bugs
continued to grow.
As the massive testing effort dragged on, and fall turned to winter with no end in sight, morale
in Engineering plummeted. It wasn’t just that the testing was yielding no useful result; the
engineers also resented having a “solution” forced on us from above.
Around the end of November, the senior engineers began to understand that we had been
correct—the “comprehensive testing” approach alone was not going to solve our problems. We
understood that we needed something more, or perhaps something different. However, we were
so busy analyzing test failures and fixing bugs that we had no time to spend on understanding
the real problem, much less on fixing it.
Meanwhile, the entire engineering staff was losing what little credibility it had left with Senior
Management. After six months under the “improved” process, there was no sign of actual
improvement. Senior Management was beginning to wonder whether they could trust
Engineering to deliver at all.
2.2.2. Analysis
We made two fundamental mistakes at Tartan during this period. First, Senior Management
mandated a specific change to our development process. This was a good-faith effort to do the
right thing. Unfortunately, Senior Management did not have the necessary experience to
recognize that using testing at the end of a process solely to “add quality” to the product being
tested does not work.
The fact that the mandated change turned out to be counterproductive should not come as a
surprise4. It is unreasonable to expect non-engineers to make correct engineering decisions
without the necessary background. On the other hand, Senior Management should have
understood that they should not be making detailed engineering decisions. A reasonable
alternative action would have been to insist that the Engineering leadership acknowledge and
solve our process problems. If necessary, this insistence could reasonably have taken the form of
replacing the V.P of Engineering.
The Engineering leadership made the second fundamental mistake. Even though we had
predicted that it would fail, we allowed ourselves to be convinced to try the comprehensive
testing approach. Instead, we should have countered with an alternative. This was particularly
bad because we had all read The Mythical Man Month (especially the section on Product
Testing). We all “knew” that testing the quality in at the end does not work. Nevertheless, we
tried that approach for nearly six months before seriously planning an alternative approach.
4
The surprise is that the Senior Management team ordered us to solve a real and important problem, but it wasn’t
the key problem we faced. Tartan’s product testing before that point was clearly both insufficient and ineffective.
Many problems that cropped up in the field really should have been found during in-house testing. That said, the
added testing was needed to verify that our development process was (or, in this case, was not) producing good
results. It could not, however, solve the fundamental problem.
Page 10 of 20
The Engineering leadership is fundamentally responsible for the software development process.
When Senior Management dictated the comprehensive testing approach, the Senior Engineers
had a second chance to uphold our appropriate responsibility by taking charge of making
changes to our software development process. This responsibility must be discharged by the
engineering leadership, even when it means discounting non-technical management’s specific
process fix.
2.3. Successful process change yields big rewards

From late fall 1992 to early winter 1993 the Senior Engineers at Tartan designed a new software
development process. The new process allowed us to focus our efforts in areas with much higher
payoff than the old process had. The details of the new process were simple. The ideas we
applied had been known at least since the early 1970s. Every part of the new process is at least
hinted at in—and often prescribed by—The Mythical Man Month.
2.3.1. Narrative
From December 1992 through February 1993, comprehensive testing continued as mandated.
Morale plummeted to new lows. No end to our problems was in sight.
Meanwhile, gripe sessions among the Senior Engineers slowly began to change into
brainstorming and problem-solving sessions. One key part of this change in attitude was our
realization that simply blaming Senior Management for the whole mess was an abdication of our
responsibilities as engineers. We were the people who should have made sure that quality did not
slip in the first place. And we were the only people in the company who might—just maybe—
have the knowledge and experience needed to fix our problems.
We studied the literature, both academic and popular. It is impossible, at this late date, to
reconstruct a complete list of references we consulted. Some that stand out for their contribution
to our eventual process, however, are:
• The Mythical Man Month by Fred Brooks [3]. This masterpiece of Software Engineering
literature began the entire effort. We were inspired by the fact that Brooks and his team
surmounted much greater problems nearly two decades earlier. The guidance he gave helped
to change us from “gripers” to “fixers.”
• Code Complete by Steve McConnell [5] stands out as a comprehensive and valuable
guide, both for its clear exposition of many issues, and especially for its suggestions on
further reading.5
• Fagan’s work [6] [7] on Inspections and their effectiveness. Although we knew about
this work before beginning our study, we didn’t really come to appreciate it until after
reading Code Complete. This body of work helped form our strategy in the area of reviewing
each other’s designs and code.
• Structured Walkthroughs by Ed Yourdon [11]. Another book we appreciated only after
reading Code Complete. This book inspired our choices of specific techniques for code
reviews and walk-throughs.
• Studies on the cost of fixing bugs in various phases of development [2] [4] [6]. These
papers convinced us to focus our efforts earlier in the software life-cycle.
5
We were lucky enough to acquire a number of early copies of this book through contacts at the publisher.
Page 11 of 20
• “Managing the development of large software systems: Concepts and techniques,” by

Winston W. Royce [10]. This paper, often known as the “waterfall method” paper, helped us
to understand the benefits of planning. It also convinced us that a straight waterfall model
would be a disaster in our circumstances.
• “A Spiral Model of Software Development and Enhancement” by Barry Boehm [1].
This paper led us to favor an incremental approach to almost every development activity,
with reviews and feedback to help us aim at the correct target.
The new process we designed was based on a number of key ideas:
• Detect and correct errors as early as possible. There is a substantial body of evidence that
the earlier a bug is fixed, the less expensive it is for the organization. We clearly needed to
achieve as much of this reduced cost as possible.
• Find and replace “bug farms.” We attempted to find and replace the buggiest parts of our
code base, on an ongoing basis. This re-implementation activity was intended to pay off by
preventing as many “lurking” bugs as possible.
• A “Think–Act–Review” cycle. We applied this concept, which is discussed in detail
below, at every opportunity and every scale of development, from individual bug fixes up
to large new-feature enhancements.
• Ongoing reviews of old code. The primary purposes of these reviews were to spread
understanding of the old code more widely across the development team, and to update
any associated design documents that had suffered bit-rot. Finding bugs was a secondary,
but also useful goal.
• Lightweight process. We were concerned that some of our staff, including some of our
most productive developers, were “coding cowboys” who preferred to “shoot from the
hip.” We tried to keep the paperwork and structure to a minimum, both to keep the
overhead as low as possible and to facilitate adoption. “It’s no good if we can’t sell it to
Dave6” was a common observation.
• The most heavyweight part of our process was the rules for source code check-in, soon
known as THE THREE COMMANDMENTS OF CHECK-IN:
1. THOU SHALT NOT check-in any change unless it has passed the overnight test suite
for two different host/target combinations.
2. THOU SHALT NOT check-in any change that has not passed its purpose written tests,
whether from a test plan or from a bug cut-down.7
3. THOU SHALT NOT check-in any change that has not been through its appropriate
review. This includes review of the proof of compliance with the first two
COMMANDMENTS.
The mechanism for enforcing the check-in rules was direct peer review, immediately
before check-in. Engineers were required to show proof of compliance with the THREE
6
Throughout this document, names have been obfuscated to protect both the innocent and the guilty. That said,
“Dave” was our most notorious, most productive, and most senior “coding cowboy.”
7
A bug cut-down is a small collection of test cases that demonstrate the presence or absence of a specific bug. These
test cases were produced during the debugging process, and would be harnessed as regression tests after successful
correction of the bug.
Page 12 of 20
COMMANDMENTS to a fellow engineer, and to convince that peer that the testing
performed was indeed sufficient.
Nothing in the new process was novel. It was a careful tailoring of well-known techniques to fit
Tartan’s specific circumstances.
After convincing management that the new development process was the right way to go—as
discussed further in the next section—we briefly shut down the testing process while all
developers analyzed and categorized the remaining unanalyzed test failures. After this point, four
months working within the new process was sufficient to produce a working product release.
The improved process held up over the following three years, until the company was sold in the
summer of 1996. During this period we added several new products, including a new source
language (C++) and several new target architectures, leading to a near doubling of the compiler
code base to 1.5 MSLOC. Comparison of before and after time sheets and other data suggests
roughly an 8x productivity improvement (see box on page 19 for more details).
2.3.2. Analysis
Key Observation: The engineers who designed the new process still did not clearly understand
that our inability to identify bugs in earlier phases of development, and the cost of fixing those
bugs later in development was the key source of our sudden development meltdown. Fortunately,
our decision to focus on catching problems at earlier stages of the development cycle directly
addressed the root problem.
The three changes with the largest impact in the short term were institutionalizing the “Think–
Act–Review” pattern, replacing bug-farms with all-new code, and making the rules for source
code check-in a normal part of our daily work.
2.3.2.1. The “Think-Act-Review” Pattern
The general theme of our new process was “Think–Act–Review.” It is self-evident that thinking
before acting is generally a good idea. Writing down what you've thought about serves two
purposes: First, it forces the writer to make a coherent presentation of his thoughts. Second, it
helps to provide convincing evidence that the writer has actually thought carefully about his
plans. Reviewing one’s output—whether it be designs, code, test plans, test results, or
whatever—provides both an opportunity for other eyes to find problems you have missed, and a
check that you actually followed the process. Although this description may sound rather
Waterfall-ish, we used it more as a relatively fine-grained iterative process. We aimed for
design-implement-test cycles that could be finished in about a week. Naturally, we spent much
less time for simple bug fixing, and occasionally more for major upgrades and new features.
The “Think–Act–Review” pattern provided several levels of benefit. In the small, the
requirement for having at least one peer review every change before check-in provided much of
the benefit claimed for “Pair Programming” at a fraction of the cost. This level of the pattern is
where the COMMANDMENTS OF CHECK-IN operated. Many subtle errors were caught during these
peer reviews—along with some embarrassingly blatant mistakes as well.
At a somewhat larger scale, the “Think–Act–Review” pattern fostered clarity of thinking by
requiring developers to explain their planned changes to a peer group. This requirement for
explanation typically led to a certain amount of writing about planned changes, which also
fostered more careful thought. These reviews served as a fine opportunity to check for missing
pieces and muddled thinking.
Page 13 of 20
At the highest level, the “Think–Act–Review” pattern served to institutionalize reflection on the
development process itself. This was not originally a planned part of the process. During some of
our larger reviews, various engineers asked why we weren’t applying the pattern directly to the
process. We soon included a section on “what did we mess up process-wise?” in each major
review.
2.3.2.2. Replacing “Bug Farms”
Replacing bug farms had a huge impact on quality. For example, during the four months from
the start of the new process to actually shipping a new version of the product, we replaced five of
the buggiest packages in the compilers. This change alone fixed one third of the bugs on our list,
and more than half of the “really hard” bugs8. The 10% of available staff time we spent replacing
these packages was likely much less than the cost of analyzing and fixing that many bugs—
especially the “really hard” ones.
2.3.2.3. Rules for Source Code Check-in
The rules for source code check-in, known as “THE THREE COMMANDMENTS OF CHECK-IN”
served to protect us from ourselves. The FIRST COMMANDMENT’s requirement for running the
over-night test suite led to a significant decrease in checked-in “fixes” that actually broke the
system for other developers. The SECOND COMMANDMENT’s requirement for passing purpose-
written tests and/or bug cut-downs helped us avoid checking in changes that did not behave as
expected, such as a “bug fix” that doesn’t actually fix the bug.
In many ways the THIRD COMMANDMENT was the most important. Most obviously, it served as
the penultimate line of defense against cutting corners on the process. Perhaps more importantly,
however, the simple act of explaining a bug-fix to another engineer often led to the bug-fixer
suddenly realizing that there were additional special cases he hadn’t considered, or that some
targets had different requirements. Suddenly trailing off in mid-explanation, saying “Wait a
minute! I forgot about…” was a common occurrence for the entire engineering staff.
The THIRD COMMANDMENT worked, in part, because each engineer knew that being responsive
to review requests from other engineers was in his or her own best interest. Today I may be
reviewing someone else’s code; tomorrow I’ll be looking for a reviewer. It’s always easier to
find someone to help when you’ve been helpful yourself. In addition, we found that it was
important to put the reviewer’s name on the check-in right along with the author’s name. This
practice encouraged reviewers to be careful about checking the appropriateness and correctness
of the change they were reviewing. If the fix doesn’t work, and your name is there as reviewer,
you expect to be asked why you didn’t catch the problem.
“Living righteously” by adhering to these commandments significantly reduced rework of bug
fixes and time wasted when someone else broke the system. It also helped us make the entire
process an ingrained part of our development culture.
8
”Really hard” in this context means either that failure diagnosis didn’t fit in the original time-box, thus moving the
bug to the “really hard” bug list, or that the first attempt to fix the bug failed so the developer moved it to the “really
hard” bug list and moved on. These bugs were typically much more difficult to fix than the ordinary variety.
Example “really hard” bugs include a register allocation problem that manifested only when there were more than
1024 variables in the conflict graph, and a mysterious compiler crash caused when the Solaris OS occasionally
failed to restore register values correctly after a context switch.
Page 14 of 20
2.3.2.4. Longer-term Improvements

Ongoing walkthroughs of old code did not have a substantial short-term benefit, but did help
track down a few difficult bugs. One important feature of code walkthroughs as practiced at
Tartan was defined and mandated output from each walkthrough. Each walkthrough had a
designated note-taker who recorded information about changes needed in design documents and
another designated note-taker for bugs discovered during the walkthrough. After the meeting the
note-takers were responsible for updating the design documents and filing bug reports based on
the issues discussed during the meeting.
The output of the walkthroughs—both updated design documents and broader understanding—
made a significant difference in the medium-term and longer, as did the recovery and sharing of
architectural and design knowledge. The walkthroughs also aided identification of “buggy code,”
which was then considered as a candidate for complete replacement. Over the longer term, the
walk-throughs turned out to be well worth their substantial cost. It took most of a year for this to
become clear, however.
2.3.3. Summary
Software process need not be complex. We obtained a tremendous improvement in useful
productivity by adopting a few simple ideas, all of which, by 1993, had been known for over a
decade. When carefully applied, even very simple approaches can have a large impact. We
increased the amount of process used by our organization, but we did so without adding
significant ceremony or paperwork. In our experience additional paperwork would not have
added value. Instead, we focused on activities that directly improved quality and productivity,
such as having programmers routinely inspect each other’s work.
Inappropriate process does more harm than good. The sections above show the effects of two
different inappropriate processes. The original process simply became inadequate due to growth
in size and complexity of our tasks. The “comprehensive testing” process led to a complete melt
down of our ability to make forward progress in development. In Tartan’s market, either of these
processes would shortly have led to the failure of the company.
2.4. Process change is hard. Pay close attention to adoption issues.

“Depend upon it, sir, when a man knows he is to be hanged in a fortnight, it concentrates his
mind wonderfully.” — Samuel Johnson, from Boswell’s Life of Johnson
We experienced substantial difficulty in implementing our new process. Group buy-in and peer
pressure eventually solved the problem of process compliance in Engineering. Management buy-
in was achieved by carefully drawing management’s attention to the growing likelihood of the
company’s demise as a result of falling support-contract revenue.
2.4.1. Narrative
It was surprisingly easy to get initial agreement to try the new process. Both management and
engineers signed up promptly when confronted with the likely alternative of going out of
business. The real problems came in the longer term.
Every developer had problems with the switch to the new process. Even those who designed the
new process found it difficult. Many parts of the process were easy to swallow, but one in
particular was very hard. That one was the check-in rules. Even the most vocal proponents of
Page 15 of 20
the check-in rules had trouble following them. Some developers hated the check-in rules so
much that they became process “hold-outs” over that issue alone.
A typical interaction early on might go like this:
Reviewer – “Did you follow the *&^%#*&^%$ check-in rules?”
Reviewee – <sighs> “Yes, I did follow those &^%&^% pain in the %%$# rules” (or perhaps
“no. I didn’t.” <sigh>).
Then one day the most notorious holdout in the organization “got it”—he suddenly understood
that the check-in rules helped all of us to avoid stupid mistakes at low cost. Dave described his
sudden understanding as being “almost a religious experience,” and asked me why the designers
of the process hadn’t made it clear that the &^%$&^%$& check-in rules were actually
important, and not just bureaucratic BS. I responded that we had tried to do so, but obviously
hadn’t gotten the idea across to him.
The next day, while reviewing some code I wanted to check in, Dave asked me “Did you follow
THE THREE COMMANDMENTS OF CHECK-IN?” (Imagine the “three commandments” part spoken
in a deep pretentious announcer-style voice). When I stopped laughing, I admitted that I actually
had missed a step that time.
Within days, this description of the check-in rules had spread through the entire team. After this
change, a typical interaction went more like:
Reviewer – “Did you FOLLOW THE THREE COMMANDMENTS?”
Reviewee – “Sure did! And here’s the proof…”
This would be a pointless anecdote, except for one surprising thing—compliance with the rules
went up for the entire team. Both the hold-outs and the engineers who were trying to follow the
new process very carefully found it easier to follow “THE THREE COMMANDMENTS OF CHECK-
IN” than it had been to follow “those &^%$&^%$ check-in rules.” The difference was so large
that we went back and changed the official description of the check-in rules to be “THE THREE
COMMANDMENTS OF CHECK-IN” in all the documents describing the process. It may sound
silly, but it worked. Similar application of humorous peer pressure solved nearly all of our other
problems with compliance with the new process. In the few cases where peer pressure was
ineffective, we resorted to management sanction.
The other main point of contention within the engineering organization came over the issue of
standards for coding style and formatting. About one month after starting to work with the new
process we noticed that many code walkthroughs and reviews involved more time arguing about
formatting than actually looking at the code!
We took a solution straight from McConnell’s Code Complete: each group responsible for a
functional area developed their own consensus style guide. We then agreed that all new code
written by members of that group would follow their team’s standard. However, spending time
reformatting existing code was strictly forbidden unless the code in question was undergoing
very substantial change or rework. All debate about formatting was then placed off-limits
during reviews and walk-throughs. The details of the consensus style guides turned out to be far
less important than the consensus itself. This approach ended the “coding style wars” entirely,
and allowed us to focus on more important parts of the job at hand.
The big process problem with management came not during the adoption phase, but much
farther down the road. Management was continually tempted to respond to tight deadlines by
suggesting that we set aside the process for the duration of the “emergency.” Both Engineering
line managers and the Senior Management team fell victim to this error at various times.
Page 16 of 20
2.4.2. Analysis
Without management support, a new process has no teeth. In particular, it must be possible to
discipline developers who persistently fail to follow the process. When peer pressure fails, this
discipline requires management intervention.
A more difficult aspect of management support for software process is their temptation to set
aside process to meet short-term goals. Sustaining a disciplined process, even a very lightweight
process, requires significant effort. There were similar temptations for the developers. Doing
more coding or debugging is much more fun than sitting in a code walk-through or reviewing
each-other’s work prior to check-in. Having management offer the excuse of a short-term crisis
only adds to the developer’s temptation to back-slide. Our only successful response was for the
engineering staff to patiently point out that the “overhead” activities that management was
suggesting we set aside were key parts of a process that was yielding tremendous productivity
improvements. “Skipping those activities makes things go slower, not faster!” In the occasional
cases where that approach didn’t work, we simply refused to comply with management’s
instruction to “temporarily” drop the process. This rather unsatisfactory approach has obvious
risks associated with it.
2.4.3. Summary
Process change is hard. Even after the initial resistance to change is overcome, it remains
difficult to change longstanding work habits. We almost didn’t manage to change our process.
Humorous peer pressure was a key enabler for process change in our case.
Humorous peer pressure works surprisingly well. The specific techniques we used may not be
appropriate in other organizations, but they worked far better than we ever expected. The most
successful humorous approaches were initiated by “process hold-outs,” usually as they finally
“converted” to the new process.
“Bottom up” process change is easier than “top down.” Tartan’s engineering staff was better
able to accept a process designed by their peers than a process mandated by management. Take
advantage of this tendency by involving the engineering staff early and fully in any process
change activity.
3. Process Revisited
If I found myself in a similar situation in the future, I would:
• Investigate “Extreme Programming” (XP). My reading suggests that many of the ideas
espoused by advocates of Extreme Programming are rather similar to those we
considered important at Tartan. The proponents of Extreme Programming seem to have
emphasized some areas differently than we did. For example, proponents of XP
recommend that all programming be performed by pairs of developers. This is
reminiscent of, but appears more expensive than, Tartan’s system of reviews. It would be
interesting to know whether the added expense is worthwhile. XP may well be better than
the process we designed at Tartan for at least some organizations and problems.
• Try harder to reduce the overhead of the new process. Examination of time sheets about
18 months into the new process showed that we were expending 40% of staff time on
“process overhead.” That overhead included reviews both formal and informal, code
Page 17 of 20
walk-throughs, replacement of bug farms, and preparation time for reviews. We also
found that we’d achieved roughly an 8x improvement in completed work, with a
substantial decrease in needed rework as well. Finding a way to cut the 40% “process
overhead” to a smaller number would yield still greater improvement.
If asked to select pieces from Tartan’s new process in priority order, my first choice would be the
“Think–Act–Review” model in all its varied incarnations. My second choice would be THE
THREE COMMANDMENTS OF CHECK-IN. If the problem at hand involved a substantial body of old
code, I’d consider walk-throughs and bug farm replacement to have equal claim on third place.
Computing the ~8x productivity improvement

To estimate the benefit of the new process, we began by using time-sheet data from the year before
our development meltdown as a baseline, and data from the year after the new process was put in
place for comparison. We then counted completed bug fixes (both normal and “really hard”), re-
work rates on bug fixes, new features implemented successfully, and general project sizes. We
made an educated guess about the overall difficulty of each of the larger projects, using our long
experience with our problem domain as our guide. The sheer number of bug fixes made during the
two years in question made it impossible for us to assess their difficulty on an individual basis.
Instead, simple bug fixes were arbitrarily assumed to be of equal difficulty, and the “really hard”
bugs were assigned a single (much larger) difficulty rating. This data was then used to compute an
estimate of total useful work successfully completed both before and after the process switch—the
numerator for the productivity metric.
We used total paid staff hours as the denominator. Thus, we included the “process overhead”
involved in both the old and the new processes when estimating productivity. In addition, we
assumed only forty paid hours per week per engineer even though the new process brought a
significant decrease in staff overtime. The latter decision tends to understate the impact of the
process change.
Our finding of an 8x productivity improvement due to the process change was surely approximate.
Any thoughtful observer can easily find grounds on which to argue that our estimation process was
flawed. That said, it was the best estimate we were able to make with the data at hand.
4. Conclusion
This report has presented issues and lessons learned, from the author’s point of view, of difficult
changes in the software development process at Tartan Inc. between 1992 and 1993. Our
products out-grew our initial development process, leading to a sharp increase in defects and in
missed milestones in development. Senior management responded by imposing a modified
process with a very heavy emphasis on testing. This changed process only made our situation
worse. Finally, the senior engineers designed a new development process carefully tailored to
Tartan’s specific situation and needs. Working within the new process yielded enough
improvement to get development back on track.
Page 18 of 20
From this experience, we draw the following lasting lessons:

• Development process must evolve to suit the task at hand.
• Inappropriate or outdated process choices cause large decreases in productivity and
quality.
• Conversely, appropriately tailored process is a tremendous productivity multiplier.
• Development process need not be complex or heavyweight; simple approaches can be
very effective.
Page 19 of 20
5. References
[1] B. Boehm. "A spiral model of software development and enhancement." IEEE Computer,
21(5):61-72, 1988.
[2] B. Boehm and P. N. Papaccio. “Understanding and Controlling Software Costs.” IEEE
Transactions on Software Engineering SE-14 no. 10 (October 1988): 1462-77.
[3] Frederick P. Brooks, Jr. the Mythical Man Month, Reading, MA: Addison-Wesly, 1975.
[4] Robert H. Dunn. Software Defect Removal. New York: McGraw-Hill, 1984.
[5] Steve McConnell. Code Complete. Microsoft Press, 1993.
[6] Michael E. Fagan. “Design and Code Inspections to Reduce Errors in Program
Development.” IBM Systems Journal 15, no. 3 pp. 182-211, 1976
[7] Michael E. Fagan. “Advances in Software Inspections.” IEEE Transactions on Software
Engineering SE-12 no. 7 (July 1986) pp. 744-51.
[8] P.K. Lawlis and T.W. Elam, "Ada Outperforms Assembly: A Case Study." Proceedings of
TRI-Ada, 1992. Also available at http://www.seas.gwu.edu/~adagroup/sigada-
website/lawlis.html as of Mar. 2004.
[9] M. Lehman and L. Belady. Program Evolution: processes of software change. Academic
Press, 1985
[10] Winston W. Royce. "Managing the Development of Large Software Systems: Concepts and
Techniques." Proceedings, IEEE WESCON. August 1970, Pages1-9
[11] Edward Yourdon. Structured Walkthroughs. Yourdon Press Computing Series, Prentice
Hall, 1988
[12] “Ada is Good for Real-Time” at “Ada Home: Home of the Brave Ada Programmers”
http://www.adahome.com/Ammo/Stories/Tartan-Realtime.html. Current as of Feb. 2004
Page 20 of 20

Sutherland Pracitcum Meltdown Final

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sutherland Pracitcum Meltdown Final

Uploaded by

Copyright:

Available Formats

PRACTICUM: A TALE OF T HREE PROCESSES: REFLECTION ON S OFTWARE DEVELOPMENT PROCESS CHANGE AT TARTAN

A Tale of Three Processes: Reflection on Software

A large software product family under continuous maintenance and evolution

1.1. Tartan, Inc.

Tartan’s Business Model

Characteristics of the code base

1.2. Overview of Events

1.3. Software Process Status Quo Ante

Because of the extremely diverse

2.1.2. Growth of the Compiler

2.1.3. Implications of Growth

2.2. Non-technical management should mandate results, not detailed process

2.3. Successful process change yields big rewards

• “Managing the development of large software systems: Concepts and techniques,” by

2.3.2.4. Longer-term Improvements

2.4. Process change is hard. Pay close attention to adoption issues.

Computing the ~8x productivity improvement

From this experience, we draw the following lasting lessons:

You might also like