Professional Documents
Culture Documents
4, JULY 1978
345
I. INTRODUCTION
Current State of Knowledge
THE earliest effQrts at software cost estimation arose from
the standard industrial practice of measuring average
productivity rates for workers. Then an estimate of the total
job was made-usually in machine language instructions. Machine language instructions were used because this was the way
machines were coded in the early years and because it also
related to memory capacity which was a severe constraint with
early machines. The total product estimate was then usually
divided by the budgeted manpower to determine a time to
completion. If this was unsatisfactory, the average manpower
level (and budget) was increased until the time to do the job
met the contract delivery date. The usual assumption was
that the work to be done was a simple product-constant productivity rate multiplied by scheduled time-and that these
terms could be manipulated at the discretion of managers.
Brooks showed in his book, T7he Mythical Man-Month [1 ] that
this is not so-that manpower and time are not interchangeable,
that productivity rates are highly variable, and that there is no
Manuscript received June 15, 1977; revised December 15, 1977 and
March 6, 1978.
The author is with the Space Division, Information Systems Programs,
General Electric Company, Arlington, VA 22202.
0098-5589/78/0700-0345$00.75
C 1978 IEEE
346
347
SIOPERS
120
100
60
40
20-
FY
Fig. 1. Manpower loading as a function of time for the Army's Standard Installation-Division Personnel System (SIDPERS).
MY/YR
TYPICAL
TIME
Fig. 2. Typical Computer Systems Command applicator of manpower to a software development project. Ordinates of
the individual cycles are added to obtain the project life-cycle effort at various points in time.
348
CUMULATIVE
EFFORT
(TOTAL PEOPLE)
MANPOWER
(PEOPLE/YR)
,
/
\.
t
td
TIME
t/td
Fig. 4. Large scale software application system life cycle with subcycles and empirically determined milestones added.
(average) behavior over time, and note the statistical fluctuations which tell us something about the random or stochastic
aspects of the process. Conceptually the process looks like the
representation shown in Fig. 3.
The data points are shown on the manpower figure to indicate that there is scatter or "noise" involved in the process.
Empirical evidence suggests that the "noise" component may
be up to 25 percent of the expected manpower value during
the rising part of the manpower curve which corresponds to
the development effort. td denotes the time of peak effort
and is very close to the development time for the system. The
falling part of the manpower curve corresponds to the operations and maintenance phase of the system life-cycle. The
principal work during this phase is modification, minor enhancement, and remedial repair (fixing "bugs").
Fig. 4 shows the life-cycle with its principal component
cycles and primary milestones. Note that all the subcycles
(except extension) have continuously varying rates and have
long tails indicating that the final 10 percent of each phase
of effort takes a relatively long time to complete.
III. BASIC CHARACTERISTICS OF THE NORDEN/RAYLEIGH
MODEL AS FORMULATED BY NORDEN [7], [81
It has been empirically determined that the overall life-cycle
349
Norden/Rayleigh form:
MY/YR
y = 2Kate_a2
(1)
where a = (1 /2td), td is the time at which y is a maximum, K is
the area under the curve from t = 0 to infinity and represents
the nominal life-cycle effort in man-years.
The definite integral of (1) is
y=K(l-e_t2)
a,
E
(2)
MY
100
75
50
520
y =K1 _ e--l')
1 00
~~~~~~~~K=
a =,I-vv
a=0-02
25
12
10
18
.2
7i
.2
'8
time
MAN MONTHis
25(O
j
moat2x2 2-
7>
a PARAMETER DETERMINING
o .0556
/
io _
t =ELAPSED TIME
Ymo' 1213
--)O
j max 86.6
--
FROM START
a00200
-L f71
I/
50
4 1 6
10
max3
gmox3
m x
td
16
14
12
MONTHS
18 20
22
24
26
t mox 7
$DEV = $COST/MY f j dt
0
16
time
- $COST/MY * (0.3945 K)
14
j m,
82.0
2100
- -
i maux121.3
i
60.6
g K z 1500
!50
;>
X _ _
_
_
00
50
0
K=1000
;X
\vX
,,
SO
,K:500
I~~~~~~~~~~~~~~~~~~
4 1 6
t mox 5
10
MONTHS
12
14
16
lB
350
SOFTWARE DEVELOPMENT PROCESS
(TRANSFORMATION PROCESS)
IWR K I
in
iY/t)
11 25)
in
J*
INTERCEPT =
1.
K/td2)
MANPOWER
DEVELOPMENT
TIME
~~~~~SLOPE
LOSSES
in
(5) _
t2 (years2I
5
8_
t ; f +, t
>
7 8 9 1
; 3,
- S
351
4
7 8 9 1
X;a
PRODUCTI V ITY
X40 81 AVi!gH'~~~%LACS
.T 4 t ' iM'MS +
.:
ix
~~~~~4
6 0-
S5
ASCOM1-SWt9<VAD
<
lll!lllo l||'!ll|
,hR=1
1
+~~~~~~pms
D'
X X 1- Hf0 4 14 ;Xi~~~~s0t-Xi
[ ;0 --f
2
X
3t t
1 ,
..|
'i!l''l
ili
<
1l'|
S|AA|
--
7 8 9 l
DIFFICULTY,
D =
iOY
7 8 9
K/td
Fig. 9. Productivity versus difficulty for a number of U.S. Army Computer Systems Command systems. Lines 1, 2, and 3
represent different development environments. Line 1 represents a continental United States environment, batch job
submission, a consistent set of standards and for standard Army systems that would be used at many installations. Line 2
is typical of a Pacific development environment, (initially) a different set of standards from 1, and the intended application was for one (or only a few) installations. Line 3 is typical of the European environment, different standards, single
installation application. The two points shown for lines 2 and 3 are insufficient in themselves to infer the lines. Other
more comprehensive data [14] show relations like line 1 with the same basic slope (-.6 2 to -.72). Based on this corroborating evidence, the lines 2 and 3 are sketched in to show the effect of different environment, standards, tools, and
machine constraints.
l;INPUT
SS)
(K, td,O
MANPOWER
fl(K, t)
OUTPUT
\\
PR
PR= 12000
x~~~~~~~~~~
DEEOPENT
(WHAT WE
PAY FOR)
K/ d2
ENTROP
Ss=fiPR)y1(t)dt
g
__o v
S, = PR (.17K()?,
(WHAT WE
GET-END
PRODUCT)
ABORTED
COMPILES
t ,'J) J.
J 1
JQ
TIME X PR
td
develop others.
Difficulty Gradient
We can study the rate of change of difficulty by taking the
gradient of D. The unit vector ^1 points in td direction. The
unit vector 1 points in K (and $COST) direction.
grad D points almost completely in the -i(-td) direction.
The importance of this fact is that shortening the development
time (by management guidance, say) dramatically increases
the difficulty, usually to the impossible level.
Plotting of U.S. Army Computer Systems Command systems
on the difficulty surface shows that they fall on three distinct
lines:
Kftd3 = 8, K/td = 15, and K/td3 = 27.
The expression for grad D is
gradD=
td3
+ 22
t2
(7)
K/td3.
352
LINE
Study of all U.S. Army Computer Systems Command systems shows that 1) if the system is entirely new-has many
interfaces and interactions with other systems-C 8; 2) if the
system is a new stand-alone system, C _ 15; and 3) if the system is a rebuild or composite built up from existing systems
where large parts of the logic and code already exist, then
C_ 27. (These values may vary slightly from software house
to software house depending upon the average skill level of
the analysts, programmers, and management. They are, in a
sense, figures of merit or "leaming curves" for a software
house doing certain classes of work.)
but its dimensions are source statements per year (S). The
area under this coding rate curve is the total quantity of final
end product source statements (SS) that will be produced by
time t. Thus, if we include all the area under the coding rate
curve we obtain S8 = 2.49 *PR *K/6 source statements, defined as delivered lines of source code. From our empirical
relation relating the productivity PR to the difficulty, we can
complete the linkage from product (source statements) to the
management parameters.
Now the design and coding curve A, has to match the overall manpower curve y' initially, but it must be nearly complete
by td; thus, Y, proceeds much faster, but has the same form
asy5.
353
parameters.
Recall that we found that PR = Cn, D-2/3 by fitting the data.
The constant appears to take on only quantized values. Substituting,
Ss = 2.49 * PR * K/6
-2494,~~2/ K16
= 2.49 Cn
354
1, 4
43
si
E
.M
SYSTEM SIZE
SS
(000)
(a)
DEV
6
5f MY
-0
41
J'
/ 01-
)Sr~~~~~~~~~~~~~~~~~~= 1/ L<
- TIME - EFFORT
TRADE-OFF CHART
IlUU
zUU
Suu
4uu
bEu
SYSTEM SIZE S s ()
bUU
_SIZE
k
700
Knn
4/3
800
ck
10040
900
1000
"I
(b)
Fig. 12. (a) Size-time-effort tradeoff for function Ss = 4984 K1/3 td/3. This is representative of about 5 years ago. Batch
development environment, unstructured coding, somewhat "fuzzy" requirements and severe test bed machine constraint.
(b) Size-time-effort tradeoff for function S. = 10040 K11/3 t413 This is reasonably representative of a contemporary
development environment with on-line interactive development, structured coding, less "fuzzy" requirements, and machine access fairly unconstrained. In comparing the two sets of curves note the signifilcant improvement in performance
from the introduction of improved technology which is manifested in the constant Ck.
355
2at (K - y)
A constant number of source statements implies K td constant. where j is the rate of accomplishment, 2at is the "pace" and
So K = constant/td, or proportionally, development effort = (K - y) is the work remaining to be done. Making the explicit
substitution for a = (1/2td) we have j = t/td (K - y). t/t2 is
constant/t4d, is the effort-development time tradeoff law.
Note particularly that development time is compressible Norden's linear learning law.
We differentiate once more with respect to time, rearrange,
only down to the governing gradient condition (see Fig. 12);
the software house is not capable of doing a system in less and obtain
time than this because the system becomes too difficult for
(9)
tltd2 .. + yltd2 Kltd2 =D.
the time available.
Accordingly, the set of curves relating development effort
This is a second derivative form of the software equation.
(0.4 K), development time (td), and the product (Ss) permits The Norden/Rayleigh integral is its solution which can be
project managers to play "what if" games and to tradeoff verified by direct substitution. Note that this equation is
cost versus time without exceeding the capacity of the soft- similar to the nonhomogeneous 2nd order differential equaware organization to do the work.
tions frequently encountered in mechanical and electrical
systems. There are two important differences. The forcing
Effect ofConstant Productivity
function D = K/tdi is a constant rather than the more usual
One other relation is worth obtaining; the one where the sinusoid, and the _j term has a variable coefficient t/td, proaverage productivity remains constant.
portional to the 1st power of time.
PR = C, D-213 = constant implies that D = constant.
Uses Of the Software Equation
So the productivity for different projects will be the same
The differential equation J + t/t2 y + y/t2 = K/td is very
only if the difficulty is the same. This does not seem reasonuseful
because it can be solved step-by-step using the Rungeable to expect very frequently since the difficulty is a measure
Kutta
solution.
The solution can be perturbed at any point
of the software work to be done, i.e., K/td =D which is a
= D, the difficulty. This is just what happens
by
changing
K/td
function of the number of files, the number of reports, and
in
the
real
world
when
the customer changes the requirements
the number of programs the system has. Thus, planning a new
or
while
is in process. If we have
specifications
development
project based on using the same productivity a previous pro- an estimator for D (which we do) that
relates K/td to the sysject had, is fallacious unless the difficulty is the same.
tem characteristics, say the number of files, the number of
The significance of this relation for an individual system is
and the number of application programs, then we can
that when the PR is fixed, the difficulty K/td remains constant reports,
calculate
the change in D and add it to our original D, conduring development. However, we know that the difficulty is
tinue
our
Runge-Kutta solution from that point in time and
a function of the system characteristics, so if we change the
the time slippage and cost growth consequences of
thus
study
system characteristics during development, say the number of
such
requirements
changes. Several typical examples are given
files, the number of reports, and the number of subprograms, in
[131.
then the difficulty will change and so will the average proWhen we convert this differential equation to the design and
ductivity. This in turn will change our instantaneous rate of
curve j1 by substituting td/\f6, and K1 =K/6, we
coding
code production (S3) which is made up of both parameter
obtain
terms and time-varying terms.
In summary then, Ss = Ck K113 t4/3 appears to be the manageK
.. 6t l 6
(10)
2
ment equation for software system building. As technology
d
2t
td
td
td2'
improves, the exponents should remain fixed since they relate to the process. Ck will increase (in quantum jumps) with and multiplying this by the PR and conversion factor as before
new technology because this constant relates to the overall we obtain an expression that gives us the coding rate and cumuinformation throughput capacity of the system and (tenta- lative code produced at time t.
tively) seems to be more heavily dependent on machine
+
+
=
PR.
is tI(td/INf6)2 Ss lI(td/V6)2 SS
2.49
K/tlt
=
td
) a1a3
Cn(Ka
356
RUNGE-KUTTA SOLUTION
TABLE I
CODING RATE DIFFERENTIAL EQUATION
FOR SIDPERS
TO
Cumulative Code
Coding Rate
(Ss/year)
t (years)
0
.5
1.0
1.5
2.0
2.5
3.0
3.5
-0
(SS)
(000)
(000)
52.8
89.2
101.0
90.8
68.4
44.2
24.9
20.33
12.3
5.36
2.09
3.65
4.0
4.5
5.0
13.6
50.0
98.6
147.0
187.0
215.0
323.0
236.0
241.0
246.0
247.0
4-
Actual size
at
extension
is 256,000 S
which is
close
pretty
to this
SIDPERS parameters
K
td
D
PR
=
=
=
=
700
3.65
52.54
914
MY
years
MY/yr
S /MY (burdened)
DIFFERENTIAL EQUATION
S + ___t_ __
3.651 2
S +
_ _1
13.65
46
E} y
=
=
2.49.
2.49
29902
= 111785
FI
D = 2.49.(12009D-2 3).D
(12009)
(D)1/3
(52.54)1/3
time t in the Runge-Kutta solution and changes in requirements studied relative to their impact on code production.
These two conditions establish a probable region on the tradeoff chart (Fig. 12(b), say). The gradient condition established
the limiting constraint on development time. One can then
heuristically pick most probable values for development time
and effort without violating constraints, or one can even simulate the behavior in this most probable region to generate expected values and variances for development time and effort.
The author uses this technique on a programmable pocket
calculator to scope projects very early in their formulation.
Another empirical approach is to use a combination of prior
history and regression analysis that relates the management
parameters to a set of system attributes that can be determined before coding starts. This approach has produced good
results.
Unfortunately, K and td are not independently linearly related to such system attributes as number of files, number of
reports, and number of application programs. However, K/td =
D is quite linear with number of files, number of reports, and
number of application subprograms, both individually and
jointly. Statistical tests show that number of files and number
of reports are highly redundant so that we can discard one and
get a good estimator from the number of application subprograms and either of the other two.
There are relationships other than the difficulty that have
significantly high correlation with measures of the product.
These relationships are
357
TABLE II
MPMIS
Development Time
td (YRS)
Files
xl
Rpts.
x2
Appl. Progs.
X3
73.6
2.28
94
45
52
MRM
84
1.48
36
44
31
ACS
33
1.67
11
74
39
SPBS
70
2.00
34
23
27.5
1.44
14
41
35
AUDIT
10
2.00
11
CABS
7.74
1.95
22
14
12
COMIS
MARDIS
91
2.50
10
27
101
2.10
25
95
109
CARMOCS
153
2.64
13
109
229
SIDPERS
700
3.65
172
179
256
VTAADS
404
3.50
155
101
144
MPAS
BASOPS-SUP
591
2.73
81
192
223
SAILS AB/C
1028
4.27
540
215
365
SAILS ABlX
398
1193
3.48
670
200
STARCIPS
344
3.48
151
59
75
STANFINS
741
3.30
270
228
241
SAAS
118
2.12
131
152
120
COSCOM
214
4.25
33
101
130
ments is zero.
A numerical procedure to obtain estimators for the various
Ktd, Ktd, K ... K/td terms is to use multiple regression analysis and data determined from the past history of the software
house. A set of data from U.S. Army Computer Systems
Command is shown in Table II.
We will determine regression coefficients from x2, X3 to
estimate KItd as an example of the procedure. We write an
(13)
a2
[C22 C23
-'
[Klt -X2
(14)
358
(MY)
10
100
1000
NO. OF APPLICATION PROGRAMS
10000
Fig. 13. Life-cycle size as a function of number of application programs and number of reports.
6l-3 -2 -1 0 1 4
1 -3
1 -2
1 -1
1 0
1 1
14
In K
In td
which yields
InK =6.5189;K 677.86 MY
In td = 1.2772;rd 3.59 YRS.
The data used for these examples are that of SIDPERS (K =
700, td = 3.65, as of 1976). The estimators perform quite
well in this general development time and size regime. They
are less reliable and less valid for small systems (K < 100 MY;
td < 1.5 years) because the data on the larger systems are more
consistent. Extrapolation outside the range of data used to
generate the estimators is very tenuous and apt to give bad
results.
Once the estimating relationships have been developed for
the software house as just outlined above, then they can be
run through the range of likely input values and system types
to construct some quick estimator graphs. An example of an
early set of quick estimator charts for U.S. Army Computer
Systems Command is shown in Figs. 13-15. These figures give
considerable insight into the way large scale software projects
I 1 1 1 1 C
--3 -2 -1 0 1 4
2.6391
4.0540
5.2494
6.4809
7.7441
1 1.6614
359
10000
1 000
K
(MY)
100
10
100
1000
NO. OF APPLICATION PROGRAMS
10000
td (YRS)
50
100
150
200
250
300
350
400
450
500
1-
Fig. 15. Development time as a function of number of application programs and number of reports.
360
[191
The software state variables are
1) the state of technology C,n or Ck;
2) the applied effort K;
[20]
3) the development time td; and
4) the independent variable time t.
1211
The software equation relates the product to the state
variables:
[221
PR y, dt =CkKd13t1
[23]
Ss=
00
1251
REFERENCES
[261
Addison-Wesley, 1975.
[21 L. H. Morin, "Estimation of resources for computer programming
projects," M.S. thesis, Univ. North Carolina, Chapel Hill, NC,
1973.
[31 P. F. Gehring and U. W. Pooch, "Software development management," Data Management, pp. 14-18, Feb. 1977.
141 P. F. Gehring, "Improving software development estimates of
time and cost," presented at the 2nd Int. Conf. Software Engneering, San Francisco, CA, Oct. 13, 1976.
[51 -, "A quantitative analysis of estimating accuracy in software
development," Ph.D. dissertation, Texas Univ., College Station,
TX, Aug. 1976.
[6] J. D. Aron, "A subjective evaluation of selected program development tools," presented at the Software Life Cycle Management
Workshop, Airlie, VA, Aug. 1977, sponsored by U.S. Army
Computer Systems Command.
[7] P. V. Norden, "Useful tools for project management," in Management of Production, M. K. Starr, Ed. Baltimore, MD: Penguin,
1970, pp. 71-101.
"Project life cycle modelling: Background and application
[81
of the life cycle curves," presented at the Software Life Cycle
Management Workshop, Airlie, VA, Aug. 1977, sponsored by
U.S. Army Computer Systems Command.
[91 L. H. Putnam, "A macro-estimating methodology for software
development," in Dig. of Papers, Fall COMPCON '76, 13th IEEE
Computer Soc. Int. Conf., pp. 138-143, Sept. 1976.
[101 -, "ADP resource estimating: A macro-level forcasting methodology for software development," in Proc. 15th Annu. U.S.
Army Operations Res. Symp., Fort Lee, VA, pp. 323-327, Oct.
26-29, 1976.
f111 -, "A general solution to the software sizing and estimating
problem," presented at the Life Cycle Management Conf. Amer.
Inst. of Industrial Engineers, Washington, DC, Feb. 8, 1977.
[121 -, "The influence of the time-difficulty factor in large scale
[271
[281
[29]
[30]
[311
-,
Lawrence H. Putnam was born in Massachusetts. He attended the University of Massachusetts for one year and then attended the U.S.
Military Academy, West Point, NY, graduating in 1952. He was commissioned in Armor and served for the next 5 years in Armor and
Infantry troop units. After attending the Armor Officer Advanced
Course, he attended the U.S. Naval Post Graduate School, Monterey,
CA, where he studied nuclear effects engineering for 2 years, graduating in 1961 with the M.S. degree in physics. He then went to the
361
Office of the Director of Management Information Systems and Assistant Secretary of the Army at Headquarters, DA. His final tour of
active service was with the Army Computer Systems Command as
Special Assistant to the Commanding General. He retired in the grade
of Colonel. He is now with the Space Division, Information Systems
Programs, General Electric Company, Arlington, VA. His principal
interest for the last 4 years has been in developing methods and techniques for estimating the life-cycle resource requirements for major
software applications.