You are on page 1of 13

COMPUTINGPRACTICES

Edgar H. Sibley Software engineering is a discipline in search of objective measures for


Panel Editor
factors that contribute to software quality. NPATH, which counts the
acyclic execution paths through a function, is an objective measure of
software complexity related to the ease with which software can be
comprehensively tested.

UPATH: A MEASURE OF EXECUTION PAW


COMPLEXITY AND ITS APPLICATIONS

Perhaps the “biggest bang for the buck” in software search [16, 171. A difficulty of this research is that func-
testing comes from assuring quality at the function or tions that contain any looping construct can. have an
unit level. This is because detecting and correcting infinite number of execution paths. Any meaningful
unit-level defects during integration and system testing measure of the number of paths in a function must be
can be very costly. Assuring quality at the function based on some finite subset of the (usually) infinite set
level depends, at least in part, on how thoroughly a of execution paths. A major criticism of existing soft-
function can be tested. Thus, the thoroughness with ware complexity metrics based on the number of acy-
which functions can be tested is an important design clic execution paths is that there is a poor relationship
concept. Designers and developers can increase the between the finite subsets of paths selected by various
quality of their software systems by making sure the measures and the set of all execution paths. Thus, the
functions that constitute their systems can be compre- accurate measurement of the acyclic execution path
hensively tested. complexity for functions must be addressed.
In addition to factors such as modularity and input
space size that affect the testability of a software sys- SHORTCOMINGS OF MCCABE’S MEASURE
tem, an equally important factor that affects testability OF PATH COMPLEXITY
is the number of execution paths through its functions. Perhaps the most widely discussed complexity metric is
Functions with more execution paths are more difficult McCabe’s cyclomatic complexity metric [ll]. McCabe’s
to test than functions with fewer execution paths. The metric “attempts to determine the number of execution
number of execution paths in a function has received paths in a function” [ll, p. 3091. The cyclomatic com-
considerable attention in software measurement re- plexity number, V(G), is the number of logical condi-
tions in the function plus I. McCabe argued that V(G)
The author is currently a senior member of the technical staff at tht: Software represents the number of fundamental circuits in the
Productivity Consortium. The author’s present address is B. Nejmeh. SPC.
1880 North Campus Commons Drive. Reston. VA 22091. flow-graph representation of a function. [Evangelist [5]
points out that this formula is correct only when each
cl3 1988 ACM OOOl-0782/88/0200-0188 $1.50 predicate vertex in the flow graph has outdegree 2).

Communications of the ACM February 1988 Volume 31 Number 2


Computing Practices

Finally, McCabe argued that the number of fundamen- l A possible execution path is any path from Ventry to
tal circuits acts as an index of testing effort. Vexit in a flow graph.
There are several problems with McCabe’s metric.
l An elementary cycle is any path P from vertices VI,
First, the number of acyclic paths in a flow graph varies
Vz, . . Vk such that VI = Vk and V, <> V, for 1 <
from a linear to an exponential function of V(G) [5].
i < j <= k. In short, an elementary cycle is a cycle
Thus, to assert that V(G) is a reasonable estimate of the
that contains no other cycles within it.
number of execution paths through a function is unjus-
tified. Moreover, the number of acyclic execution paths l A loop control vertex is a vertex V with the following
that may not be tested by a methodology based on two properties: (1) V has an out-edge that lies on at
McCabe’s metric varies from 0 to 2N, where N is the least one elementary cycle that begins and ends at V,
number of vertices in the flow graph [5]. The poor and (2) V has a second out-edge that lies on a path
relationship between the number of acyclic execution leading out of the loop. An appealing property to de-
paths and the number of execution paths tested, based termine the loop control vertices in a control flow
on McCabe’s metric, suggests that McCabe’s assumption graph is that any execution path through a function
that total testing effort is proportional to V(G) should can be constructed once the loop vertices are known.
not be accepted. This happens when zero or more of the cycles that
A second problem with McCabe’s metric is that it comprise the loop for a vertex are substituted until
fails to distinguish between different kinds of control the entire execution path is constructed.
flow structures (i.e., the measure treats the if, while,
l A loop is a cycle that begins and ends at a given loop
for, etc., structures the same). Certain control flow
control vertex.
structures, however, are more difficult to understand
and use properly. l A range of a statement V is the set of statements
Finally, Curtis [Z] argues that McCabe’s metric does whose execution may be determined by the truth
not consider the level of nesting of various control value of the expression in statement V.
structures (e.g., three for-loops in succession will re-
sult in the same metric value as three nested for- BACKGROUND OF THE NPATH MEASURE
loops). Curtis argues that nesting may influence the To keep the execution path measure finite and elimi-
psychological complexity of the function. Moreover, nate redundant information, a measure of execution
many researchers [4] argue that psychological complex- path complexity should not reflect every possible itera-
ity has a large impact on software quality. tion of a loop. This suggests that a good characterization
In light of the problems with McCabe’s metric, a new of the number of execution paths in a function should
and intuitively more appealing measure of software only count a single iteration of each loop. Such a metric
complexity has been developed. The new metric of counts all paths where a loop is not iterated more than
software complexity, NPATH, overcomes the shortcom- twice. It is a count on the number of acyclic execution
ings associated with McCabe’s metric. paths through a function.
This approach has initiated the development of
BACKGROUND DEFINITIONS NPATH, a metric that counts the number of execution
The following definitions are pertinent to the discus- paths through functions written in the C programming
sion of NPATH. language [9]. Although the NPATH metric is defined
for the C programming language, most of the control
l A control flow graph is a graph in which each vertex
structures available in C are similar to the control
represents either a basic block of code (statement se-
structures in other high-level languages (e.g., Pascal,
quence that contains no branches) or a branch point
PL/l, Fortran). Thus, the NPATH approach is applica-
in the function, and each edge represents possible
ble to other programming languages.
flow of control. More formally, the control flow graph
of a function can be represented as a directed graph
THE EXECUTION PATH COMPLEXITY
four-tuple (V, E, Ventry, Vexit), where
OF C CONTROL FLOW STRUCTURES
-V is a set of vertices representing basic blocks of The acyclic execution path complexity expressions for
code or branch points in the function; each of the control flow structures in the C program-
-E is a set of edges representing flow of control in the ming language are defined in the following subsections.
function:
-Ventry, an element of V, is the unique function entry if Statement (Figure 1, next page)
vertex; and Syntax
-Vexit, an element of V, is the unique function exit if ((exw-))
vertex. (if-range)
s;
l A path P in a control flow graph G is a sequence of
vertices ( VO, VI, . . , V,), such that there is an edge The semantics of the if statement are as follows: If the
from V, to Vi + 1 for i = 0. . . . . i - 1. exnression (exor ) is True. then the statement com-

February 1988 Volume 31 Number 2 Communicatidns of the ACM 199


Computing Pracfices

The semantics of the if-else statement are as fol-


lows: If the expression (expr) is True, then the state-
(ewr) ment comprising the (If -range) is execc.ted; other-
wise, the statement comprising the (else -range) is
executed.
The acylic execution path complexity for the if -
(if-range) False else statement is
NP(if-else)
=NP((lf-range)) +NP((else-range))
+ NP((expr)).

FIGURE1. Flow Graph for the i f Statement while statement (Figure 3)


Syntax
prising the (if-range) is executed; otherwise, the while ((expr))
statement following the if statement is executed. (while-range)
The acyclic execution path complexity (NP) for the S;
if statement is
NP(if) = NP(( if range)) + Np(( expr)) + 1.
This expression is derived from the flow-graph repre-
sentation of the statement. In particular, the number of
acyclic execution paths through the if statement is the
number of paths through the (if-range) plus 1 for
the case when the (expr) is False. The complexity False
of the logical expression (expr) is also added to the
complexity of the if statement. A definition for the
complexity of the logical expression ( expr ) appears f
(while-range)
below. The same reasoning applies to each of the fol-
lowing acyclic execution path complexity expressions.

if-else Statement (Figure 2)


Syntax
if ((expr))
(if-range)
else FIGURE3. Flow Graph for the wh i 1 e Statement
(else-range)
S;
The semantics of the while statement are as follows:
If the expression (expr) is True, then the statement
comprising the (while-range) is executed, and con-
trol branches back to the ( expr ) logical evaluation;
otherwise, the statement following the whi1.e state-

JIe (expr)
klse ment is executed.
The acyclic execution path complexity for the
while statement is
NP(whi .el
(if-range) (else-range)
=NP((while-range))+ NP((expr)) +l.

do while Statement (Figure 4)

\ Syntax
S do
(do-range)
while ((expr));
FIGURE2. Flow Graph for the i f -e 1 se Statement S;

190 Communications of the ACM February 1988 Volume 31 Number 2


Cotnputing Practices

The semantics of the for statement are as follows: The


do expression (expr I ) is used to initialize a loop control
variable, (expr2) is used as the termination condition
for the loop, and (expr3) is used as the increment/
decrement value for the loop variable upon each itera-
tion of the loop. The sequence of statements denoted by
the (for-range) are executed as long as the expres-
sion (expr2) is True.
The acyclic execution path complexity for the for
statement is
NP(for)= NP((for-range)) + NP((expr1))
+ NP((expr2)) + NP((expr3)) + 1
False
switch Statement (Figure 6, next page)
Syntax
switch (expr) )

(case-range,)
FIGURE4. Flow Graph for the do wh i 1 e Statement

(case range,)
The semantics of the do while statement are as fol- (default-range)
lows: The statement comprising the (do range) is
executed, and then, if the expression (expr) is True,
The semantics of the switch statement are as follows:
control branches back, and the statement comprising
The switch statement transfers control to one of sev-
the (do-range) is reexecuted; otherwise, the state-
eral statements depending on the value of the expres-
ment following the do while statement is executed.
sion (expr ). When the case statement is executed,
The acyclic execution path complexity for the do
(expr) is evaluated and compared with the value of
while statement is
each case. If a case value is equal to the value of
NP(do) = NP(do-range)) + NP((expr)) + 1. ( expr ), then control is transferred to the statement
following the matched case value. If there is neither a
for Statement (Figure 5) case match nor a default, then the statements in the
switch are not executed. Note that a (case-range)
Syntax
is delimited by either another (case -range) or a
for ((exprl); (expr2); (expr3))
break statement.
(for-range)
The acyclic execution path complexity for the
s;
switch statement is
NP(switch)=NP((expr))+NP((default-range))

+ C NP((case-range,)).
,=I
In the case of a null (case-range,), a situation
where the case statement falls through to the next
caseor (case-rangecl+l)),thecomplexityof
(case -rangel) is 1.

? Operator (Figure 7, next page)


Syntax
(exprl) ? (expr2):(expr3)
The semantics of the ? statement are as follows: The
expression (expr I ) is evaluated first. If (expr 1 ) is
nonzero (True), then the expression ( expr 2 ) is evalu-
ated and returned as the result of the expression; other-
wise, (expr3 ) is evaluated. Only one of (expr2) and
FIGURE5. Flow Graph for the for Statement ( expr 3 ) is evaluated.

February 1988 Volume 31 Number 2 Communications of the ACM 191


Computirfg Practices

(case-range,) (case-range,) (default-range)

FIGURE 6. Flow Graph for the switch Statement

The acyclic execution path complexity for the ? for the goto statement is difficult to define. In the case
operator is of a forward referencing goto, accounting for the com-
plexity of the code beginning at the target of the goto
NP(?)=NP((exprl))+NP((expr2))
may overstate the complexity of the code between the
+NP((expr3))+2. goto statement and the target statement. On the other
hand, a backward referencing goto would create a
For our purposes, the ? operator can be treated simi-
cycle in the program flow graph; it would thereby
larly to the if -else statement. The 2 that is included
enable the execution path complexity to be infinite.
in the NP(?) expression reflects the execution path
Given the inherent ambiguity and difficulty in ac-
complexity resulting from this statement (i.e., one path counting for the execution path complexity created by
is traversed if (expr 1 ) is True, and another path tra-
the goto statement, our path complexity metric does
versed if (expr 1 ) is False). not account for the execution path complexity intro-
duced by the goto statement. Although the number of
acyclic paths resulting from the use of the goto state-
ment could be significant in theory, in practice the
(expr1) goto statement is rarely used. Moreover, the use of
the goto statement is generally considered poor pro-
gramming practice [3].

break Statement
A break statement causes exit from the inn.ermost en-
(exw2) (expr3) closing loop (while, do, for) or switch statement in
which it appears. If and when the break statement is
reached, it ends the execution of statements within the
basic block of code where it occurs. In the context of
FIGURE 7. Flow Graph for the ? Operator Statement execution path complexity analysis, the break state-
ment can be thought of as the last statement on the
execution path containing the basic block of code in
got0 Statement
which it occurs. As such, the execution path complex-
When the statement goto label is executed, transfer
ity of the break statement is 1.
of control goes to the “labeled” statement, where pro-
gram execution continues. A goto statement is re-
Expressions
ferred to as forward rejerencing when the “labeled” state-
The syntax for a logical expression is as follows:
ment being referenced appears textually after the goto
statement. Similarly, a goto statement is referred to as (exprl)opl (expr2)op2. . . op(N-l)(exprN),
backward referencing when the “labeled” statement being
referenced appears textually before the goto state- where (expr 1 ), (expr2), , (exprN) ;Ire expres-
ment. sions and op 1, 0~2, . , op (N - 1 ) are any one of the
The acyclic execution path complexity expression logical operators and (&a) or or (I I).

192 Communications of the ACM February 1988 Volume 31 Number 2


Computing Practices

The complexity of logical expressions can have a The flow-graph representation of the statement indi-
tremendous impact on the number of execution paths cates that there are four different acyclic execution
in a function. This is because of the way logical expres- paths through this flow graph (assuming S 1 and S2
sions are evaluated in C. In particular, logical expres- are sequential statements). The path complexity
sions are evaluated only until the final truth value of expressions defined in this article lead to the same
the expression can be determined. Consider the two conclusion about the number of acyclic execution
logical expression operators && (and) and 1 1 (or). In paths in this statement. That is, the complexity of the
the case of the and operator, the truth value of the logi- if -else statement, NP(if -else), has been previ-
cal expression ( expr 1 ) && ( expr 2 ) is determined ouslydefined tobeNP((if-range))+NP((else-
as follows: If (expr 1 ) is False, then the value of the range)) + NP( ( expr )). In the above case, NP( ( if -
entire logical expression is False, and the evaluation range)) and NP(( else - range)) are each 1 since
of the logical expression is terminated; otherwise, both s 1 and ~2 are sequential statements. The
(expr2) is evaluated. If (expr2) is True, then the complexity of the logical expression (A && B) && C is
value of the entire logical expression is True; other- z (the number of && and I I operators in the expres-
wise, the value of the logical expression is False. In sion). Thus, NP(if-else) = 1 + 1 + 2 = 4.
the case of the or operator, the truth value of the logi- The acyclic execution path complexity for any logi-
cal expression ( expr 1 ) 1 1 ( expr2 ) is determined as cal expression is
follows: If (expr ) is True, then the value of the entire
NP(expression)
logical expression is True, and the evaluation of the
logical expression is terminated; otherwise, ( expr 2 ) is = number of && and I I operators in the expression.
evaluated. If (expr2) is True, then the value of the
entire logical expression is True; otherwise, the value cant inue Statement
of the logical expression is False. The continue statement forces the next iteration of
The number of expressions that may conditionally an enclosing loop (for, while, do) to begin. Thus, the
be executed in a logical expression grows linearly continue statement represents a back edge in the con-
with the number of && and 1 I operators in the logical trol flow graph of a function. NPATH does not account
expression. That is, every expression within a logical for the complexity of this construct.
expression may have to be evaluated in order to de-
termine the truth value of the entire logical expres- return Statement
sion. Therefore, the number of acyclic execution paths The return statement terminates the execution of a
added as a result of each logical operator in a logical C function. A return statement can also contain an
expression is 1. expression. Therefore, the complexity of the return
To illustrate, consider the function segment to the statement is NP(( expr )).
left, as well as a more explicit but logically equivalent
form of the function segment to the right (also see
Figure 8): if
if( (A && B) && C ) if( A )

Sl; if( B )
if
s2
else if( C ) R
I I
s2; Sl;
I which is equivalent to t
SN; else

else

else

SN; FIGURE8. A Logical Expression and Its Corresponding Flow Graph

February 1988 Volume 31 Number 2 Communications of the ACM 193


Computing Practices

sequent ial Statements and Function Calls


The execution path complexity for the sequential if a
statement is 1 because there is only one path created
:.
True ,," +* !
by consecutive sequential statements. Note that : I
function calls are treated as sequential statements. actr++;
That is, it is assumed that the code within the func-
tion being called has been unit tested and is function- Func-a( 1;
ing properly. Therefore, the execution path complex- '---..._ i
ity of function calls is also 1. - . . ;L
if b
True
THE EXECUTION PATH COMPLEXITY yr~* ,.’ -.I- ;
OF C FUNCTIONS bctr++;
False
The composite acyclic execution path complexity for a
C function, NPATH, is Func-b( ); ',
-.. +
,=N ‘-2, -k
NPATH = n NP(Statement,), if c
r=1 True

where N denotes the number of statements in the body cctrff;


of the function and NP(Statementi) denotes the
Func-c( ); ~.
acyclic execution path complexity of statement i. Note -I
that the complexity of any statement range is the prod- .-.
- f_...
“lb
uct of complexities of the statements in the range. if d
The complete algorithm to compute NPATH is listed
in Appendix A. dctr++;
j False
An Example of NPATH Func-d( );
A segment of C source code and its corresponding
NPATH measure follows (also see Figure 9):
end
if ( ch == ‘a’ )

actrff; FIGURE9. Example C Code Segment with NPATH= 16


Func-a( );

if ( ch == ‘b’ )
I statement. NP(( code segment)) = 2X 2X 2X 2=
bctr++; 16.
Func-b( );
Characteristics of NPATH
if ( ch == ‘c’ ) We now demonstrate that NPATH overcomes the short-
comings of McCabe’s measure. It was noted earlier that
cctr++;
the number of acyclic execution paths in a function
Func-c ( ); varies from a linear to an exponential function of V.
NPATH is a measure that is more closely related to the
if ( ch == Id’ ) number of acyclic execution paths through a function.
In particular, the NPATH measure differs from the ac-
dctr++; tual number of acyclic execution paths by the number
Func-d( ) of acyclic execution paths resulting from goto state-
ments. Although the number of acyclic paths resulting
NPATH = 16. from the use of the goto statement could be significant
in theory, in practice the use of the goto statement is
The NPATH value of 16 is obtained as follows: minimal and generally not thought to be good program-
NP(( if )) = NP(( if -range)) + NP(( expr)) + 1. ming practice [3].
In the above example, NP( ( expr )) = 0 for each if McCabe’s measure fails to distinguish between differ-
statement. Also note that (if-range) for each of the ent kinds of control flow structures. NPATH, on the
above if statements is a sequential statement. other hand, is based on unique expressions of acyclic
Therefore, NP(( if-range)) = 1 for each if state- execution path complexity for each C control flow
ment.Thus,NP((if))=1+0+1=2foreachif structure. Thus, the NPATH measure clearly distin-

194 Communications of the ACM February 1988 Volume 3.1 Number 2


Computing Practices

guishes between different kinds of control flow struc- The NPATH measure for this segment of code is 2.
tures. Thus, there are only two unique execution paths possi-
Another criticism of McCabe’s measure is that it ble through the first code segment. The NPATH defini-
does not account for nesting levels within a function, tion, however, does not detect this anomaly.
whereas the NPATH measure does. In particular, the Given any two logical expressions (expr 1 ) and
number of acyclic execution paths through a function ( expr 2 ) governing the execution of two different
is dependent, in part, on the level of nesting among sequential sequences of code, if (expr 1 ) is identical
statements in the function. That is, acyclic execution to ( expr 2 ) or ( expr 1 ) is not logically equivalent
path complexity is additive if one statement is nested to ( expr 2 ), and the values of the control variables in
within another; acyclic execution path complexity is (expr 1 ) and (expr2) are the same, then the NPATH
multiplicative if statements are consecutive. measure overstates the acyclic execution path complexity by
a factor of 2.
Anomaly of the NPATH Measure
An anomaly arises in the NPATH definition when a Comparing NCSL, TOKENS, V(G), and NPATH
certain class of control flow structure sequences ap- In order to assess whether traditional measures of soft-
pears in a function. Consider the following segment of ware complexity are closely related to execution path
C source code and its corresponding flow graph (also complexity, we computed the following measures of
see Figure 10): software complexity for 821 functions in a UNIX’*’ C
software application.
if ( A == B )
l NCSL is the number of noncommentary source lines
so;
of code in a function; that is, any line of program text
if ( A == B )
that is not a blank or comment.
Sl ;
l TOKENS is the number of lexical tokens in a function.
TOKENS for the C programming language include
-keywords (e.g., while and if),
-operator symbols (e.g., + and <=),
if
-identifiers (e.g., X and Msg), and
-punctuation symbols (e.g., ( , ), and ;).
The TOKENS metric is the basis for the Halstead col-
lection of metrics referred to as Software Science [6].
l V(G), McCabe’s [ll] cyclomatic complexity number,
if represents the number of fundamental circuits in the
True
flow-graph representation of a function. It is the num-
ber of logical conditions (if, while, for, case,
AA default, &&, 1 1, ?) in a function plus 1.
St False

UNIX is a registered trademark of AT&T Bell Laboratories


I

if
FIGURE10. Multiple if Flow Graph with NPATH= 4

so

The NPATH measure for this segment of code is 4. In False


the above segment of code, S 1 is executed if and only if
SO is executed. Therefore, the above segment of code is
equivalent to the following segment of code provided
Sl
j.

\,I
SO does not alter A or B (also see Figure 11):

if ( A == B )
I
so ;
Sl;
FIGURE11. Single if Flow Graph with NPATH= 2

February 1988 Volume 31 Number 2 Communications of the ACM 195


Computing Practices

l NPATH is the number of acyclic execution paths Walk-throughs/Inspections


through a function. Code walk-through and inspections have become an
integral part of the software development process.
The correlation matrix shown in Table I summarizes When schedule and resource constraints preclude the
the R’ correlations among the metrics. R” represents comprehensive review of all functions, it is important
the percentage of variance in one variable that is ex- to identify functions that would be most useful to thor-
plained by the other variable. Note that the three ough walk-throughs and inspections. Since NPATH
metrics NCSL, TOKENS, and V(G) are highly corre- measures the functional complexity and testability of
lated; the correlation between these metrics is at least code, it follows that NPATH can be used in determin-
0.97. These correlations show that NCSL, TOKENS, and ing the level of review/inspection of a function. Func-
V(G) appear to be measuring the same thing, namely, tions with high (i.e., in the top 25 percent) NPATH
lexical complexity. These three measures do not mea- values are candidates for thorough review and/or in-
sure the semantic content of code. Also, when NPATH is spection. Thus, NPATH is aiding in the process of de-
correlated to NCSL, TOKENS, and V(G), the resulting cor- ciding which functions should be thoroughly reviewed
relations are 0.57, 0.53, and 0.56, respectively. These and/or inspected.
correlations show that NPATH is somewhat indepen-
dent of the NCSL, TOKENS, and V(G) measures, with Testing
at least 40 percent of the variance in NPATH not ac- Software development organizations have limited test-
counted for by any one of the other measures. Because ing resources. To make good use of limited resources,
NPATH is measuring different factors than those testing effort might best be allocated to functions pro-
measured by NCSL, TOKENS, and V(G), the lower portional to the testability of the function. A widely
correlations between NPATH and the other measures used criterion in deciding how to apportion -testing re-
are expected. These correlations do not suggest that sources among functions in a software system is the
any of the measures is superior to the others, but they number of NCSL in each function; the larger the NCSL
do show that NPATH is measuring different factors of count of a function, the greater the resources allocated
complexity than the other measures. The correlation to it. Testing a function in proportion to its NCSL count,
measures show that NCSL, TOKENS, and V(G), which however, could lead to an inappropriate use of testing
are measuring the lexical content, are not particularly resources. This is because there is a weak relationship
sensitive to the number of execution paths through a between the NCSL of a function and the testability of
function. Thus, if we accept the premise that an im- the function, as evidenced by the correlation. matrix.
portant property of software to be measured is the A more appropriate criterion of relevance when ap-
number of execution paths through a function, then portioning testing resources is the number of unique
these data highlight the importance of the NPATH execution paths through the function. A relatively large
measure. number suggests that a relatively large proportion of
testing resources should be allocated to the function.
The rank order of the functions based on NPATH is
being used by developers to allocate testing resources
TABLE I. R* Correlation Matrix for NCSL, using this criterion. The amount of testing resources
TOKENS, V(G), and NPATH allocated to a function is proportional to the number of
acyclic execution paths through the function.. In short,
NCSL TOKENS UC) NPATH
we agree with others [8] that the extent to which a
NCSL 1 .oo 0.99 0.97 0.57 complexity measure can be used as a guide in testing
TOKENS 0.99 1 .oo 0.97 0.53 effort depends on how well the measure specifies what
V(G) 0.97 0.97 1.00 0.56 is contributing to the complexity of a program. NPATH
NPATH 0.57 0.53 0.56 1.00 is used as a guide in the testing process because it
characterizes a significant factor contributing to the
complexity of functional testing-the number of acyclic
execution paths through a function. Note the NCSL,
TOKENS, and V(G) do not capture this property of a
PRACTICAL USES OF NPATH function.
Many software complexity metrics lack practical value.
Measures are often designed without any particular use Design Criterion
in mind [a]. Such is not the case with NPATH. Software NPATH is also being used to establish a functional
developers are using NPATH to design criterion and identify functions appropriate for
redesign early in the development process. The NPATH
l select functions for thorough walk-through/inspec- value for a detailed design specification can be com-
tion, puted provided the specification uses the control flow
l allocate functional testing resources, and structures of C, such as in the program design language
l define module design criteria. PDL-C [15].

196 Communications of the ACM February 1988 Volume 3;’ Number 2


Computing Practices

Software quality can be increased by designing soft- the reduction in NPATH. This is not true in all cases:
ware that requires manageable levels of functional for example, in the case of sequential code, NPATH is
testing to assure its correctness. Thus, the testability of not changed by making the sequential code a function
each function in a software system is an important de- and then calling the function.
sign criterion. Along these lines, an NPATH threshold
value has been established to define a functional design Multiple if Statements
criterion and identify candidate functions for redesign. Another way to reduce NPATH for a function is to
An NPATH threshold value of 200 has been established implement multiple if statements via the switch and
for a function. The value 200 is based on studies done case statements. The following simple example illus-
at AT&T Bell Laboratories [14]. For functions that ex- trates this strategy: The original sequence of if state-
ceed the threshold value of 200, methods to reduce ments
NPATH complexity are provided to developers.
if ( c == ‘a1 )
Additional Considerations ca++;
Any decision to allocate inspections, testing, or design if ( c == ‘b’ )
effort based on NPATH must also take into account cb++;
if ( c == ‘c’ )
the criticality of the function: Whereas even moder- cc++;
ately high NPATH values in a heavily used function if ( c == ‘C’ )
would identify the function for thorough inspection cc++;
and testing as well as possibly redesign, a noncritical if ( c != ‘a’ && c != ‘b’ && c != ‘c’ &&
function of similar NPATH complexity might not c != ‘C’ )
warrant the same level of attention; and cOther++;
factors other than complexity impact on software has an NPATH value of 80 (2 X 2 X 2 X 2 X (2 + 3)).
quality: Requirements volatility, software develop- An equivalent, less complex case statement imple-
ment environment, developer experience, reuse, and mentation of the same sequence
the use of code generators all impact on software
quality. switch ( c )
I
Thus, the use of NPATH cannot provide absolute prin- case ‘a’:
ciples for software development, but is a useful adjunct ca++;
to traditional and intuitive measures of software com- break ;
plexity. case lb’:
cb++;
METHODS TO REDUCE COMPLEXITY break ;
If a method is to be useful in controlling software com- case ‘C’ :
plexity, then it must index a function’s complexity cc++;
level, as well as suggest ways to reduce complexity [8]. break ;
Such is the case with NPATH. Many strategies to re- case ‘C’ :
duce the NPATH complexity of functions are being cc++;
used by software developers. Some of the most effective break ;
methods of reducing the NPATH value include default:
cOther++;
l distributing functionality,
break ;
l implementing multiple if statements as a switch
t
statement, and
l creating a separate function for logical expressions has an NPATH value of 5 (1 + 1 + 1 + 1 + 1).
with a high count of and (&a) and or (I 1) operators.
Operators per Logical Expression
Distributing Functionality NPATH can be reduced for a function with a high
To reduce NPATH for a function, divide the function count of and (&&) and or (I I) operators in a logical
into blocks of code that logically belong together. Cre- expression by creating a separate function for the logi-
ate a new function for each block of code. Then, re- cal expression. Suppose the following logical expression
place each block of code with a call to the appropriate occurs several times in a function:
newly created function. The original functionality is if ((vl && v2) I I ((~3 I I v4) && (v5 &&
thus distributed, reducing the NPATH value for the
~6)) 1.
original function because function calls are treated as
sequential statements. The new separate function for the logical expression
Generally, the more functions defined, the greater looks like the following:

February 1988 Volume 31 Number 2 Communications of the ACM 197


Computing Practices

if ( v-check () ) monotonically increase or decrease. Therefore, reliabil-


ity models need to take into account the dynamic na-
ture of MTTF rates. NPATH and NPATH-coverage
v-check () monitors could be used to estimate the way MTTF
changes over time.
/* assuming VI, v2, v3, v4, v5, v6 are l Let NPtotal denote the total NPATH for a system.
global to the module */ l Let NP,,,, denote the number of unique acyclic exe-
return ( (vl && v2) 11 ((~3 11 v4) cution paths executed thus far.
&& (v5 && ~6)) ,;] l Let NPfail denote the total number of failures covered
I
thus far.
Although strategies to reduce the NPATH complexity l The path failure rate (PFR) at time T is defined as
of functions are important, care must be taken not to PFR = NPfa,l/NPexec.
distort the logical clarity of the software by applying a l Let NPest-fail denote the estimated number of failures
strategy to reduce the complexity of functions. That is, remaining in the software; then NPest_fail =: PFR X
there is a point of diminishing return beyond which a (NPtota~- NP,,,,).
further attempt at reduction of complexity distorts the Both PFR and NPest-fail could be extremely valuable
logical clarity of the system structure. in predicting how MTTF changes over time. Note that,
as ( NPtotal - NP,,,, ] approaches 0, the software is more
FIJTURE DIRECTIONS AND SUMMARY completely tested, and the reliability of the system
NPATH counts the acyclic execution paths through a should behave more consistently (i.e., in a near mono-
C function. The practical applications of NPATH and tonic fashion).
methods to reduce NPATH have been discussed. There are also several useful extensions to the
The success of NPATH suggests other possible appli- NPATH measure. First, the NPATH measure considers
cations. For example, monitoring the coverage of code any call to a function as a sequential statement;
during the testing process has proved to be an effective function calls do not add complexity. An extension to
method of improving and verifying testing. Most cover- the current model of software complexity would be to
age monitors report either (1) the percentage of NCSL capture the complexity of subsystems and entire sys-
executed during code execution, or (2) the percentage tems by accounting for the acyclic execution path com-
of branches executed during code execution. Although plexity of the calling sequences within each function
both measures of code coverage are useful, in practice, making up the system. Another extension to the model
systems that have close to 100 percent NCSL and would be to apply the proposed notion of acyclic execu-
branch coverage may not be adequately tested. The tion path complexity to hardware, and to defi.ne the
reason for this is that, although every line of code and notion of system complexity as a function of the num-
branch point in a program might be executed, there ber of hardware and software execution path.s through
could be a substantial number of execution paths in a a system.
program that have not been. Path coverage is much Finally, in comparison to code, there is little known
more difficult to achieve than branch coverage; branch about measuring the complexity of requirements and
coverage is much more difficult to achieve than code designs. This is partially because programming lan-
coverage. Monitoring path coverage would more accu- guages provide a formal notation on which to base
rately reveal the completeness of software testing. measures, whereas formal notations for expressing re-
Future work on developing an NPATH-based coverage quirements and designs have only emerged recently.
monitor is an important next step of this research. Formal notations such as Structure Charts [18], Booth
Another potentially useful application of NPATH Diagrams [l], and Data Flow Diagrams [12] now pro-
is in the area of software reliability modeling. A vast vide computational models and notations on which to
majority of software reliability models [13] require base measures of requirement and design complexity.
a priori estimates for the model parameters To and R. Future software measurement research should focus
To is the MTTF (mean time to failure) at the start of on defining and analyzing measures of requirements
testing; R is the rate at which MTTF is assumed to and design complexity based on such notations.
inc:rease over time. To date, no acceptable means of
estimating To and R prior to entering system test has
been established. NPATH could be used to develop ini-
tial approximations for To and R. The basic approach TABLE II. To Complexity Classes Based on NPATH
would be to define To complexity classes based on
NPATHrange TO
NPATH; as NPATH increases, To would decrease.
Table II illustrates this point. Obviously, NPATH ranges l-l 000 100
and T, values need to be found through empirical 1001-2500 75
2501-5000 60
study.
The MTTF rate (R) changes over time; it does not

198 Communications of the ACM February 1988 Volume 31 Number 2


Computing Practices

APPENDIX A. NPATH Algorithm


l Next v is either (1) the first statement in the compound
statement that follows V, or (2) LAST if it is the last statement
in some compound statement.
l Bool-Comp ofv is the complexity of the expressions in
statement V.
NPath ( V )
statementv;

if ( Vis LAST )
return ( 1 );
else

switch ( statementtypeofv )

case IFST: /* if statement*/


return ( (NPATH(if-range ofV)+Bool-CompofV+ 1) * NPATH(NextV) );
caseIFEST: /* if-elsestatement*/
return ( (NPATH(if-rangeofV)+NPATH(else-rangeofV)+Bool-CompofV)
* NPATH(NextV) );
caseWHST: /*while statement*/
return ( (NPATH(while-rangeofV)+Bool-CompofV+l) * NPATH(NextV) );
case DOST: /*dostatement*/
return ( (NPATH(do-rangeofV)+Bool_CompofVf 1) * NPATH(NextV) );
caseFORST: /* for statement*/
return ( (NPATH(for-rangeofV)+Bool-CompofV+ 1) * NPATH(NextV) );
case SWST: /* switchstatement*/
CompSW=Bool-CompofV;
for ( eachcaseanddefaultrangeinswitch)
CompSW=CompSW+NPATH(case-range);
return( CompSW* NPATH(NextV) );
caseQUESST: /* 7 statement*/
return ( (Bool-CompofV+ 2) * NPATH(NextV) );
caseGOT0: /* got0 statement*/
return ( NPath(NextV) ); /* skiptonextstatement*/
case RET: /* returnor exit statement*/
caseBRST: /* breakstatement*/
caseCONTST: /*continue statement*/
if ( Bool-Comp > 0 )
return ( Bool-Comp );
else
return ( 1 );
caseSEQ: /* sequential statement*/
if ( Bool-Comp > 0 )
return ( Bool-Comp * NPATH(NextV)
else
return ( NPATH(NextV) );
1 /*endofswitchstatement*/
) /*endofelse*/
] /* endof NPathfunction */

February 1988 Volume 31 Number 2 Communications of the ACM 199


APPENDIX B. A Summary of Execution Path Expressions
Structure Complexity expression
if NP((if-range))+NP((expr))+l
if-else NP((if-range))+NP((else-range))+NP((expr))
wbi le NP((while-range))+NP((expr))+l
do while NP((do-range))+NP((expr))+l
for NP((for-range))+NP((exprl))+NP((expr2))+NP((expr3))+1
switch NP((expr))+ I:I;NP((case-range,))+ NP((default-range))
? NP((exprl))+NP((expr2))+NP((expr3))+2
goto label 1
break 1
Expressions Number of && and 1 1 operators in expression
continue 1
return 1
sequential 1
Function call 1
C function n:$ NP( Statement,)

Acknowledgments. The author is grateful for the 15. Nejmeh. B., and Dunsmore, H. A survey of program design
languages (PDLs). In Proceedirtgs of IEEE COMPSAC ‘86 1986.
many stimulating conversations he had with Christo- pp. 447-456.
pher Fox of the Quality Software Technology Group 16. Paige. M. An analytical approach to program testing. In Proceedings
at AT&T Bell Laboratories about NPATH. of 1EEE COMPSAC ‘80. 1980, pp. 527-531.
17. Rapp, S.. and Weyuker, E. Selecting software test data flow infor-
mation. IEEE Trans. Softw. Eng. SE-II. 4 (Apr. 1985), 367-375.
18. Stevens. W.P. Using Structured Design. Wiley-Interscience. New
York. 1981.
REFERENCES
1. Booth. G. Soffwarr E‘rtgimwing With Ada. Benjamin/Cummings.
Menlo Park. Calif., 1987.
2. Curtis, B.. et al. Measuring the psychological complexity of soft- CR Categories and Subject Descriptors: D.2.2 [Software Engineer-
ware maintenance tasks with the Halstead and McCabe Metrics. ing]: Tools and Techniques--modules and interfaces; D.2.4 (Software
IEEE Trans. Softw. E,lg. SE-S, 3 (Mar. 1979). 96-104. Engineering]: Program Verification-reliabilify: D.2.5 [Software Engi-
3. Dijkstra. E.W. GO TO statement considered harmful. Comntutt. neering]: Testing and Debugging-monitors; D.2.7 [Software Engineer-
ACM II. 3 (Mar. 1968). 147-148. ing]: Distribution and Maintenance-restrucfuring; D.2.8 [Software En-
4. Dunn. R. Software Defect Renroval. .McCraw-Hill, New York. 1984. gineering]: Metrics-complexify measures; D.2.9 [Software Engineer-
5. Evangelist. M. An analysis of control flow complexity. In Proceed- ing]: Management-software quality assurance (SQ&; F.3.3 [Logics and
ittgs of IEEE COMPSAC ‘84. 1984. pp. 388-396. Meanings of Programs]: Studies of Program Constructs-control primi-
6. Halstead. M. Eknwr~ts of.Soffwarr Science. Elsevier North-Holland. tives
New York. 1977. General Terms: Algorithms, Design. Management, Measurement.
7. Johnson. S. YACC: Yet Another Compiler Compiler. Bell Laboratories. Reliability
Murray Hill. N.J.. 1975. Additional Key Words and Phrases: Execution path complexity.
8. Kearney. J.K., Sedlmeyer. R.L., Thompson, W.B.. Gray. M.A.. and NPATH. software testing
Adler. M.A. Software complexity measurement. Comntur~. ACM 29.
11 (Nov. 1986). 1044-1050.
9. Kernighan, B.. and Ritchie. D. The C Progranrnring Language.
Prentice-Hall. Englewood Cliffs, N.J.. 1978. Author’s Present Address: Brian A. Nejmeh, SPC. 1880 North Campus
10. Lesk. M.. and Schmidt, E. LEX: A Lexical A~~alysis Generalor. Bell Commons Drive, Reston, VA 22091.
Laboratories. Murray Hill.. N.J.. 1975.
11. McCabe, T. A complexity measure. IEEE Trans. Softw. Ertg. SE-Z, 4
(Apr. 1976). 308-320.
12. Mellor. S.. and Ward. P.T. Structured Developnrrtzt for Real-Time
Systems. Yourdon Press. New York, 1986. Permission to copy without fee all or part of this material is granted
13. Muss. J. Software reliability modeling. In Halldbook of Software provided that the copies are not made or distributed for direct com-
Engimwing. C. Vick and C. Ramamoorthy. Eds. Van Nostrand mercial advantage, the ACM copyright notice and the title of the
Reinhold. New York. 1984. publication and its date appear. and notice is given that copying is by
14. Nejmeh. B. Software complexity metrics study summary. Tech permission of the Association for Computing Machinery. To copy oth-
Memo.. AT&T Bell Laboratories, Holmdel. N.J.. Oct. 1986. erwise, or to republish. requires a fee and/or specific permission.

200 Communications of the ACM February 1988 Volume 3;! Number 2

You might also like