You are on page 1of 25

Code Coverage Analysis

This paper gives a complete description of code coverage analysis (test coverage analysis), a software testing technique. By Steve Cornett. Copyright Bullseye Testing Technology 1996-2011. All rights reserved. Redistribution in whole or in part is prohibited without permission.

Contents

Introduction Structural Testing and Functional Testing The Premise Basic Metrics o Statement Coverage o Decision Coverage o Condition Coverage o Multiple Condition Coverage o Condition/Decision Coverage o Modified Condition/Decision Coverage o Path Coverage Other Metrics o Function Coverage o Call Coverage o Linear Code Sequence and Jump (LCSAJ) Coverage o Data Flow Coverage o Object Code Branch Coverage o Loop Coverage o Race Coverage o Relational Operator Coverage o Weak Mutation Coverage o Table Coverage Comparing Metrics Coverage Goal for Release Intermediate Coverage Goals Summary References

Introduction
Code coverage analysis is the process of:

Finding areas of a program not exercised by a set of test cases, Creating additional test cases to increase coverage, and Determining a quantitative measure of code coverage, which is an indirect measure of quality.

An optional aspect of code coverage analysis is:

Identifying redundant test cases that do not increase coverage.

A code coverage analyzer automates this process. You use coverage analysis to assure quality of your set of tests, not the quality of the actual product. You do not generally use a coverage analyzer when running your set of tests through your release candidate. Coverage analysis requires access to test program source code and often requires recompiling it with a special command. This paper discusses the details you should consider when planning to add coverage analysis to your test plan. Coverage analysis has certain strengths and weaknesses. You must choose from a range of measurement methods. You should establish a minimum percentage of coverage, to determine when to stop analyzing coverage. Coverage analysis is one of many testing techniques; you should not rely on it alone. Code coverage analysis is sometimes called test coverage analysis. The two terms are synonymous. The academic world more often uses the term "test coverage" while practitioners more often use "code coverage". Likewise, a coverage analyzer is sometimes called a coverage monitor. I prefer the practitioner terms.

Structural Testing and Functional Testing


Code coverage analysis is a structural testing technique (AKA glass box testing and white box testing). Structural testing compares test program behavior against the apparent intention of the source code. This contrasts with functional testing (AKA black-box testing), which compares test program behavior against a requirements specification. Structural testing examines how the program works, taking into account possible pitfalls in the structure and logic. Functional testing examines what the program accomplishes, without regard to how it works internally. Structural testing is also called path testing since you choose test cases that cause paths to be taken through the structure of the program. Do not confuse path testing with the path coverage metric, explained later. At first glance, structural testing seems unsafe. Structural testing cannot find errors of omission. However, requirements specifications sometimes do not exist, and are rarely complete. This is especially true near the end of the product development time line when the requirements specification is updated less frequently and the product itself begins to take over the role of the specification. The difference between functional and structural testing blurs near release time.

The Premise
The basic assumptions behind coverage analysis tell us about the strengths and limitations of this testing technique. Some fundamental assumptions are listed below.

Bugs relate to control flow and you can expose Bugs by varying the control flow [Beizer1990 p.60]. For example, a programmer wrote " if (c)" rather than "if (!c)".

You can look for failures without knowing what failures might occur and all tests are reliable, in that successful test runs imply program correctness [Morell1990]. The tester understands what a correct version of the program would do and can identify differences from the correct behavior. Other assumptions include achievable specifications, no errors of omission, and no unreachable code.

Clearly, these assumptions do not always hold. Coverage analysis exposes some plausible bugs but does not come close to exposing all classes of bugs. Coverage analysis provides more benefit when applied to an application that makes a lot of decisions rather than data-centric applications, such as a database application.

Basic Metrics
A large variety of coverage metrics exist. This section contains a summary of some fundamental metrics and their strengths, weaknesses and issues. The U.S. Department of Transportation Federal Aviation Administration (FAA) has formal requirements for structural coverage in the certification of safety-critical airborne systems [DO-178B]. Few other organizations have such requirements, so the FAA is influential in the definitions of these metrics.

Statement Coverage
This metric reports whether each executable statement is encountered. Declarative statements that generate executable code are considered executable statements. Control-flow statements, such as if, for, and switch are covered if the expression controlling the flow is covered as well as all the contained statements. Implicit statements, such as an omitted return, are not subject to statement coverage. Also known as: line coverage, segment coverage [Ntafos1988], C1 [Beizer1990 p.75] and basic block coverage. Basic block coverage is the same as statement coverage except the unit of code measured is each sequence of non-branching statements. I highly discourage using the non-descriptive name C1. People sometimes incorrectly use the name C1 to identify decision coverage. Therefore this term has become ambiguous. The chief advantage of this metric is that it can be applied directly to object code and does not require processing source code. Performance profilers commonly implement this metric. The chief disadvantage of statement coverage is that it is insensitive to some control structures. For example, consider the following C/C++ code fragment:
int* p = NULL; if (condition) p = &variable; *p = 123;

Without a test case that causes condition to evaluate false, statement coverage rates this code fully covered. In fact, if condition ever evaluates false, this code fails. This is the most serious shortcoming of statement coverage. If-statements are very common. Statement coverage does not report whether loops reach their termination condition only whether the loop body was executed. With C, C++, and Java, this limitation affects loops that contain break statements. Since do-while loops always execute at least once, statement coverage considers them the same rank as non-branching statements. Statement coverage is completely insensitive to the logical operators ( || and &&). Statement coverage cannot distinguish consecutive switch labels. Test cases generally correlate more to decisions than to statements. You probably would not have 10 separate test cases for a sequence of 10 non-branching statements; you would have only one test case. For example, consider an if-else statement containing one statement in the then-clause and 99 statements in the elseclause. After exercising one of the two possible paths, statement coverage gives extreme results: either 1% or 99% coverage. Basic block coverage eliminates this problem. One argument in favor of statement coverage over other metrics is that bugs are evenly distributed through code; therefore the percentage of executable statements covered reflects the percentage of faults discovered. However, one of our fundamental assumptions is that faults are related to control flow, not computations. Additionally, we could reasonably expect that programmers strive for a relatively constant ratio of branches to statements. In summary, this metric is affected more by computational statements than by decisions.

Decision Coverage
This metric reports whether Boolean expressions tested in control structures (such as the if-statement and while-statement) evaluated to both true and false. The entire Boolean expression is considered one true-or-false predicate regardless of whether it contains logical-and or logical-or operators. Additionally, this metric includes coverage of switch-statement cases, exception handlers, and all points of entry and exit. Constant expressions controlling the flow are ignored. Also known as: branch coverage, all-edges coverage [Roper1994 p.58], C2 [Beizer1990 p.75], decision-decision-path testing [Roper1994 p.39]. I discourage using the non-descriptive name C2 because of the confusion with the term C1. The FAA makes a distinction between branch coverage and decision coverage, with branch coverage weaker than decision coverage [SVTAS2007]. The FAA definition of a decision is, in part, "A Boolean expression composed of conditions and zero or more Boolean operators." So the FAA definition of decision coverage requires all

Boolean expressions to evaluate to both true and false, even those that do not affect control flow. There is no precise definition of "Boolean expression." Some languages, especially C, allow mixing integer and Boolean expressions and do not require Boolean variables be declared as Boolean. The FAA suggests using context to identify Boolean expressions, including whether expressions are used as operands to Boolean operators or tested to control flow. The suggested definition of "Boolean operator" is a built-in (not user-defined) operator with operands and result of Boolean type. The logical-not operator is exempted due to its simplicity. The C conditional operator (?:) is considered a Boolean operator if all three operands are Boolean expressions. This metric has the advantage of simplicity without the problems of statement coverage. A disadvantage is that this metric ignores branches within Boolean expressions which occur due to short-circuit operators. For example, consider the following C/C+ +/Java code fragment:
if (condition1 && (condition2 || function1())) statement1; else statement2;

This metric could consider the control structure completely exercised without a call to function1. The test expression is true when condition1 is true and condition2 is true, and the test expression is false when condition1 is false. In this instance, the short-circuit operators preclude a call to function1. The FAA suggests that for the purposes of measuring decision coverage, the operands of short-circuit operators (including the C conditional operator) be interpreted as decisions [SVTAS2007].

Condition Coverage
Condition coverage reports the true or false outcome of each condition. A condition is an operand of a logical operator that does not contain logical operators. Condition coverage measures the conditions independently of each other. This metric is similar to decision coverage but has better sensitivity to the control flow. However, full condition coverage does not guarantee full decision coverage. For example, consider the following C++/Java fragment.
bool f(bool e) { return false; } bool a[2] = { false, false }; if (f(a && b)) ... if (a[int(a && b)]) ... if ((a && b) ? false : false) ...

All three of the if-statements above branch false regardless of the values of a and b. However if you exercise this code with a and b having all possible combinations of values, condition coverage reports full coverage.

Multiple Condition Coverage


Multiple condition coverage reports whether every possible combination of conditions occurs. The test cases required for full multiple condition coverage of a decision are given by the logical operator truth table for the decision. For languages with short circuit operators such as C, C++, and Java, an advantage of multiple condition coverage is that it requires very thorough testing. For these languages, multiple condition coverage is very similar to condition coverage. A disadvantage of this metric is that it can be tedious to determine the minimum set of test cases required, especially for very complex Boolean expressions. An additional disadvantage of this metric is that the number of test cases required could vary substantially among conditions that have similar complexity. For example, consider the following two C/C++/Java conditions.
a && b && (c || (d && e)) ((a || b) && (c || d)) && e

To achieve full multiple condition coverage, the first condition requires 6 test cases while the second requires 11. Both conditions have the same number of operands and operators. The test cases are listed below.
a && b && (c || (d && e)) 1. F - - - 2. T F - - 3. T T F F 4. T T F T F 5. T T F T T 6. T T T - ((a || b) && (c || d)) && e 1. F F - 2. F T F F 3. F T F T F 4. F T F T T 5. F T T F 6. F T T T 7. T F F 8. T F T F 9. T F T T 10. T T F 11. T T T

As with condition coverage, multiple condition coverage does not include decision coverage. For languages without short circuit operators such as Visual Basic and Pascal, multiple condition coverage is effectively path coverage (described below) for logical expressions, with the same advantages and disadvantages. Consider the following Visual Basic code fragment.

If a And b Then ...

Multiple condition coverage requires four test cases, for each of the combinations of a and b both true and false. As with path coverage each additional logical operator doubles the number of test cases required.

Condition/Decision Coverage
Condition/Decision Coverage is a hybrid metric composed by the union of condition coverage and decision coverage. It has the advantage of simplicity but without the shortcomings of its component metrics. BullseyeCoverage measures condition/decision coverage.

Modified Condition/Decision Coverage


The formal definition of modified condition/decision coverage is: Every point of entry and exit in the program has been invoked at least once, every condition in a decision has taken all possible outcomes at least once, every decision in the program has taken all possible outcomes at least once, and each condition in a decision has been shown to independently affect that decisions outcome. A condition is shown to independently affect a decisions outcome by varying just that condition while holding fixed all other possible conditions [DO-178B]. Also known as MC/DC and MCDC. This metric is stronger than condition/decision coverage, requiring more test cases for full coverage. This metric is specified for safety critical aviation software by RCTA/DO-178B and has been the subject of much study, debate and clarification for many years. Two difficult issues with MCDC are:

short circuit operators multiple occurrences of a condition

There are two competing ideas of how to handle short-circuit operators. One idea is to relax the requirement that conditions be held constant if those conditions are not evaluated due to a short-circuit operator [Chilenski1994]. The other is to consider the condition operands of short-circuit operators as separate decisions [DO-248B]. A condition may occur more than once in a decision. In the expression "A or (not A and B)", the conditions "A" and "not A" are coupled - they cannot be varied independently as required by the definition of MCDC. One approach to this dilemma, called Unique Cause MCDC, is to interpret the term "condition" to mean "uncoupled condition." Another approach, called Masking MCDC, is to permit more than one condition to vary at once, using an analysis of the logic of the decision to ensure that only the condition of interest influences the outcome.

Path Coverage
This metric reports whether each of the possible paths in each function have been followed. A path is a unique sequence of branches from the function entry to the exit. Also known as predicate coverage. Predicate coverage views paths as possible combinations of logical conditions [Beizer1990 p.98]. Since loops introduce an unbounded number of paths, this metric considers only a limited number of looping possibilities. A large number of variations of this metric exist to cope with loops. Boundary-interior path testing considers two possibilities for loops: zero repetitions and more than zero repetitions [Ntafos1988]. For do-while loops, the two possibilities are one iteration and more than one iteration. Path coverage has the advantage of requiring very thorough testing. Path coverage has two severe disadvantages. The first is that the number of paths is exponential to the number of branches. For example, a function containing 10 if-statements has 1024 paths to test. Adding just one more if-statement doubles the count to 2048. The second disadvantage is that many paths are impossible to exercise due to relationships of data. For example, consider the following C/C++ code fragment:
if (success) statement1; statement2; if (success) statement3;

Path coverage considers this fragment to contain 4 paths. In fact, only two are feasible: success=false and success=true. Researchers have invented many variations of path coverage to deal with the large number of paths. For example, n-length sub-path coverage reports whether you exercised each path of length n branches. Basis path testing selects paths that achieve decision coverage, with each path containing at least one decision outcome differing from the other paths [Roper1994 p.48]. Others variations include linear code sequence and jump (LCSAJ) coverage and data flow coverage.

Other Metrics
Here is a description of some variations of the fundamental metrics and some less commonly use metrics.

Function Coverage
This metric reports whether you invoked each function or procedure. It is useful during preliminary testing to assure at least some coverage in all areas of the software. Broad, shallow testing finds gross deficiencies in a test suite quickly. BullseyeCoverage measures function coverage.

Call Coverage

This metric reports whether you executed each function call. The hypothesis is that bugs commonly occur in interfaces between modules. Also known as call pair coverage.

Linear Code Sequence and Jump (LCSAJ) Coverage


This variation of path coverage considers only sub-paths that can easily be represented in the program source code, without requiring a flow graph [Woodward1980]. An LCSAJ is a sequence of source code lines executed in sequence. This "linear" sequence can contain decisions as long as the control flow actually continues from one line to the next at run-time. Sub-paths are constructed by concatenating LCSAJs. Researchers refer to the coverage ratio of paths of length n LCSAJs as the test effectiveness ratio (TER) n+2. The advantage of this metric is that it is more thorough than decision coverage yet avoids the exponential difficulty of path coverage. The disadvantage is that it does not avoid infeasible paths.

Data Flow Coverage


This variation of path coverage considers only the sub-paths from variable assignments to subsequent references of the variables. The advantage of this metric is the paths reported have direct relevance to the way the program handles data. One disadvantage is that this metric does not include decision coverage. Another disadvantage is complexity. Researchers have proposed numerous variations, all of which increase the complexity of this metric. For example, variations distinguish between the use of a variable in a computation versus a use in a decision, and between local and global variables. As with data flow analysis for code optimization, pointers also present problems.

Object Code Branch Coverage


This metric reports whether each machine language conditional branch instruction both took the branch and fell through. This metric gives results that depend on the compiler rather than on the program structure since compiler code generation and optimization techniques can create object code that bears little similarity to the original source code structure. Since branches disrupt the instruction pipeline, compilers sometimes avoid generating a branch and instead generate an equivalent sequence of non-branching instructions. Compilers often expand the body of a function inline to save the cost of a function call. If such functions contain branches, the number of machine language branches increases dramatically relative to the original source code. You are better off testing the original source code since it relates to program requirements better than the object code.

Loop Coverage

This metric reports whether you executed each loop body zero times, exactly once, and more than once (consecutively). For do-while loops, loop coverage reports whether you executed the body exactly once, and more than once. The valuable aspect of this metric is determining whether while-loops and for-loops execute more than once, information not reported by other metrics. As far as I know, only GCT implements this metric.

Race Coverage
This metric reports whether multiple threads execute the same code at the same time. It helps detect failure to synchronize access to resources. It is useful for testing multi-threaded programs such as in an operating system. As far as I know, only GCT implements this metric.

Relational Operator Coverage


This metric reports whether boundary situations occur with relational operators (<, <=, >, >=). The hypothesis is that boundary test cases find off-by-one mistakes and uses of the wrong relational operators such as < instead of <=. For example, consider the following C/C++ code fragment:
if (a < b) statement;

Relational operator coverage reports whether the situation a==b occurs. If a==b occurs and the program behaves correctly, you can assume the relational operator is not suppose to be <=. As far as I know, only GCT implements this metric.

Weak Mutation Coverage


This metric is similar to relational operator coverage but much more general [Howden1982]. It reports whether test cases occur which would expose the use of wrong operators and also wrong operands. It works by reporting coverage of conditions derived by substituting (mutating) the program's expressions with alternate operators, such as "-" substituted for "+", and with alternate variables substituted. This metric interests the academic world mainly. Caveats are many; programs must meet special requirements to enable measurement. As far as I know, only GCT implements this metric.

Table Coverage
This metric indicates whether each entry in a particular array has been referenced. This is useful for programs that are controlled by a finite state machine.

Comparing Metrics
You can compare relative strengths when a stronger metric includes a weaker metric.

Decision coverage includes statement coverage since exercising every branch must lead to exercising every statement. This relationship only holds when control flows uninterrupted to the end of all basic blocks. For example a C/C++ function might never return to finish the calling basic block because it uses throw, abort, the exec family, exit, or longjmp. Condition/decision coverage includes decision coverage and condition coverage (by definition). Path coverage includes decision coverage. Predicate coverage includes path coverage and multiple condition coverage, as well as most other metrics.

Academia says the stronger metric subsumes the weaker metric. Coverage metrics cannot be compared quantitatively.

Coverage Goal for Release


Each project must choose a minimum percent coverage for release criteria based on available testing resources and the importance of preventing post-release failures. Clearly, safety-critical software should have a high goal. You might set a higher coverage goal for unit testing than for system testing since a failure in lower-level code may affect multiple high-level callers. Using statement coverage, decision coverage, or condition/decision coverage you generally want to attain 80%-90% coverage or more before releasing. Some people feel that setting any goal less than 100% coverage does not assure quality. However, you expend a lot of effort attaining coverage approaching 100%. The same effort might find more bugs in a different testing activity, such as formal technical review. Avoid setting a goal lower than 80%.

Intermediate Coverage Goals


Choosing good intermediate coverage goals can greatly increase testing productivity. Your highest level of testing productivity occurs when you find the most failures with the least effort. Effort is measured by the time required to create test cases, add them to your test suite and run them. It follows that you should use a coverage analysis strategy that increases coverage as fast as possible. This gives you the greatest probability of finding failures sooner rather than later. Figure 1 illustrates the coverage rates for high and low productivity. Figure 2 shows the corresponding failure discovery rates.

One strategy that usually increases coverage quickly is to first attain some coverage throughout the entire test program before striving for high coverage in any particular area. By briefly visiting each of the test program features, you are likely to find obvious or gross failures early. For example, suppose your application prints several types of documents, and a bug exists which completely prevents printing one (and only one) of the document types. If you first try printing one document of each type, you probably find this bug sooner than if you thoroughly test each document type one at a time by printing many documents of that type before moving on to the next type. The idea is to first look for failures that are easily found by minimal testing. The sequence of coverage goals listed below illustrates a possible implementation of this strategy. 1. 2. 3. 4. Invoke at least one function in 90% of the source files (or classes). Invoke 90% of the functions. Attain 90% condition/decision coverage in each function. Attain 100% condition/decision coverage.

Notice we do not require 100% coverage in any of the initial goals. This allows you to defer testing the most difficult areas. This is crucial to maintaining high testing productivity; achieve maximum results with minimum effort. Avoid using a weaker metric for an intermediate goal combined with a stronger metric for your release goal. Effectively, this allows the weaknesses in the weaker metric to decide which test cases to defer. Instead, use the stronger metric for all goals and allow the difficulty of the individual test cases help you decide whether to defer them.

Summary
Coverage analysis is a structural testing technique that helps eliminate gaps in a test suite. It helps most in the absence of a detailed, up-to-date requirements specification. Condition/decision coverage is the best general-purpose metric for C, C++, and Java. Setting an intermediate goal of 100% coverage (of any type) can impede testing productivity. Before releasing, strive for 80%-90% or more coverage of statements, branches, or conditions.

References
Beizer1990 Beizer, Boris, "Software Testing Techniques", 2nd edition, New York: Van Nostrand Reinhold, 1990 Chilenski1994 John Joseph Chilenski and Steven P. Miller, "Applicability of Modified Condition/Decision Coverage to Software Testing", Software Engineering Journal, September 1994, Vol. 9, No. 5, pp.193-200. DO-178B, "Software Considerations in Airborne Certification", RCTA, December 1992, pp.31, 74. Systems and Equipment

DO-278B, "Final Annual Report For Clarification Of DO-178B Software Considerations In Airborne Systems And Equipment Certification", October 2001 SVTAS2007 Software Verification Tools Assessment Study, FAA, June 2007 Howden1982 "Weak Mutation Testing and Completeness of Test Sets", IEEE Trans. Software Eng., Vol.SE-8, No.4, July 1982, pp.371-379. McCabe1976 McCabe, Tom, "A Software Complexity Measure", IEEE Trans. Software Eng., Vol.2, No.6, December 1976, pp.308-320. Morell1990 Morell, Larry, "A Theory of Fault-Based Testing", IEEE Trans. Software Eng., Vol.16, No.8, August 1990, pp.844-857. Ntafos1988 Ntafos, Simeon,"A Comparison of Some Structural Testing Strategies", IEEE Trans. Software Eng., Vol.14, No.6, June 1988, pp.868-874. Roper1994 Roper, Marc, "Software Testing", London, McGraw-Hill Book Company, 1994 Woodward1980 Woodward, M.R., Hedley, D. and Hennell, M.A., "Experience with Path Analysis and Testing of Programs", IEEE Transactions on Software Engineering, Vol. SE-6, No. 3, pp. 278-286, May 1980.

What is Wrong Coverage

with

Statement

This paper presents an in-depth discussion of the risks and misconceptions of a commonly used code coverage metric. By Steve Cornett. Copyright Bullseye Testing Technology 1999-2011. All rights reserved. Redistribution in whole or in part is prohibited without permission.

Summary
Software developers and testers commonly use statement coverage because of its simplicity and availability in object code instrumentation technology. Of all the structural coverage criteria, statement coverage is the weakest, indicating the fewest number of test cases. Bugs can easily occur in the cases that statement coverage cannot see. The most significant shortcoming of statement coverage is that it fails to measure whether you test simple if statements with a false decision outcome. Experts generally recommend to only use statement coverage if nothing else is available. Any other metric is better.

Introduction
Statement coverage is a code coverage metric that tells you whether the flow of control reached every executable statement of source code at least once. Attaining coverage on every source statement seems like a good objective. But statement coverage does not adequately take into account the fact that many statements (and many bugs) involve branching and decision-making. Statement coverage's insensitivity to control structures tends to contradict the assumption of code coverage testing itself: thorough testing requires exercising many combinations of branches and conditions. In particular, statement coverage does not call for testing the following:

Simple if statements Logical operators (&&, ||, and ?:) Consecutive switch labels Loop termination decisions Do-while loops

Statement coverage has three characteristics that make it seem like a good coverage metric. Upon close inspection, they all become questionable. Statement coverage is:

Simple and fundamental Measurable by object code instrumentation

Sensitive to the size of the code

Experts agree. A number of software testing books and papers give descriptions of statement coverage that range from "the weakest measure" to "not nearly enough". Line coverage, basic block, and segment coverage are variations of statement coverage. They all have similar characteristics and this document applies equally to all of them, except where noted.

Code Coverage Testing is Really Path Testing


The fundamental assumption of code coverage testing is that to expose bugs, you should exercise as many paths through your code as possible. The more paths you exercise, the more likely your testing is to expose bugs. A path is a sequence of branches (decisions), or conditions (logical predicates). A path corresponds to a test case, or a set of inputs. In code coverage testing, branches have more importance than the blocks they connect. Bugs are often sensitive to branches and conditions. For example, incorrectly writing a condition such as i<=n rather than i<n may cause a boundary error bug. Statement coverage encourages a view of source code as relatively important blocks of statements, incidentally connected by branches. When using statement coverage, you can easily focus on testing the blocks of code and forget about testing the logic that binds them. If you were testing a brick wall, you would focus on the mortar as much as the bricks.

Specific Issues
Simple If-Statements
Statement coverage does not call for testing simple if statements. A simple if statement has no else-clause. To attain full statement coverage requires testing with the controlling decision true, but not with a false outcome. No source code exists for the false outcome, so statement coverage cannot measure it. If you only execute a simple if statement with the decision true, you are not testing the if statement itself. You could remove the if statement, leaving the body (that would otherwise execute conditionally), and your testing would give the same results. Since simple if statements occur frequently, this shortcoming presents a serious risk. See Simple If-Statement Example.

Logical Operators
Statement coverage does not call for testing logical operators. In C++ and C these operators are &&, ||, and ?:. Statement coverage cannot distinguish the code separated by logical operators from the rest of the statement. Executing any part of the code in a statement causes statement coverage to declare the whole statement

fully covered. When logical operators avoid unnecessary evaluation (by short circuit), statement coverage gives an inflated coverage measurement. This problem often occurs even when logical operators occur on different source code lines. Some compilers, such as Microsoft C++, only provide one debug line number for a decision, even if it spans multiple source lines. See Logical Operator Example.

Consecutive Switch Labels


Statement coverage does not show the need to test separate consecutive switch statements labels. Consecutive switch labels have no statements between them. Statement coverage only calls for testing the code following the labels. This pitfall leads to incomplete testing because statement coverage assumes the value checking done by switch statements (the object code) is irrelevant. In fact, the different values in a switch controlling expression may reflect different test scenarios even if the values are handled by the same code. See Consecutive Switch Labels Example.

Loop Termination Decisions


Statement coverage does not call for testing loop termination decisions. Statement coverage only calls for executing loop bodies. In a loop that stops with a C++/C break statement, this deficiency hides test cases needed to expose bugs related to boundary checking and off-by-one mistakes. See Loop Termination Decision Example.

Do-While Loops
Statement coverage does not call for testing iteration of do-while loops. Since do-while loops always execute at least once, statement coverage sees them as fully covered whether or not they repeat. If you only execute a do-while without repeating the loop, you are not testing the loop. You could remove the do-while, leaving the statements that would otherwise execute repetitively, and your testing would give the same results. See Do-While Loop Example.

Common Misconceptions
Simple and Fundamental
Statement coverage is the simplest structural coverage metric in that it calls for the least testing in order to achieve full coverage. Additionally, statement coverage is a fundamental metric in that most other structural coverage metrics include statement

coverage. However, statement coverage is not the simplest metric to understand and statement coverage is not fundamental to good testing. Some coverage metrics other than statement coverage are fairly simple. Condition/decision coverage calls for exercising all decisions and logical conditions with both true and false outcomes. This metric is simple to understand and leads to more complete testing than statement coverage. Testing experts often describe statement coverage as a basic or primary level of coverage. Most other structural coverage metrics subsume, or include, statement coverage. However, this only holds for full coverage, which rarely occurs in practice even with statement coverage. The difficulty of attaining additional coverage increases exponentially with all types of coverage. Rather than spend your time on the most difficult part of statement coverage, you make better progress using a more sensitive coverage metric that offers more test cases, some of which may require relatively little effort. Even if you do achieve 100% statement coverage, you have not necessarily exercised all your object code even though it appears you have exercised all your source code. The object code corresponding to branches is still vulnerable. Statement coverage may be the most basic metric, but it is not part of good testing.

Measurable By Object Code Instrumentation


Compared to source code instrumentation, object code instrumentation typically operates more quickly and supports multiple programming languages. However, the reason object code instrumentation coverage analyzers measure statement coverage is because statement coverage is the only metric they can implement. Stronger coverage metrics require source code instrumentation. A statement coverage analyzer usually results from leveraging an existing product line that is based on object code instrumentation. The instrumentation needed for statement coverage analysis shares similarities with the technology needed for profiling, debugging and run-time error checking. Rarely does anyone develop object code instrumentation for the sole purpose of making a coverage analyzer. Typically, a company develops other code analysis tools, and then applies the technology to coverage analysis later. Conversely, coverage analyzers that use source code instrumentation invariably support coverage metrics stronger than statement coverage. Choosing statement coverage because your profiler supports it is like using locking pliers as a wrench. They will work, but if you are going to tighten more than a few nuts, you want to get a wrench.

Sensitivity To Basic Block Length


At first, sensitivity to basic block length might seem like a benefit. If you assume an even distribution of bugs through code, it makes sense to expect the percentage of

statements covered to reflect the percentage of bugs discovered. See Sensitivity To Basic Block Length Example 1 However, if you assume bugs occur more often due to interactions with control structures than in isolated computations, statement coverage's insensitivity to control structures is a drawback. Path testing fundamentally assumes that you must exercise many paths through your code to find bugs. It makes more sense to expect the number of tested branches and conditions to reflect the percentage of bugs discovered. See Sensitivity To Basic Block Length Example 2 Sensitivity to basic block length is not beneficial since it comes at the expense of sensitivity to paths and test cases. Basic block coverage is not sensitive to basic block length. Basic block coverage is the same as statement coverage except the unit of code measured is each sequence of non-branching statements. Segment coverage is another name for basic block coverage.

Code Examples
Simple If-Statement Example
The C++ code fragment below contains a simple if statement.
int* p = NULL; if (condition) { p = &variable; *p = 1; } *p = 0; // Oops, possible null pointer dereference

Without a test case that causes condition to evaluate false, statement coverage declares this code fully covered. In fact, if condition ever evaluates false, this code dereferences a null pointer.

Logical Operator Example


The C++ function below contains a statement with a logical-or operator that may circumvent executing the rest of the statement.
void function(const char* string1, const char* string2 = NULL); ... void function(const char* string1, const char* string2) { if (condition || strcmp(string1, string2) == 0) // Oops, possible null pointer passed to strcmp ... }

Statement coverage declares this code fragment fully covered when condition is true. With condition false, the call to strcmp gets an invalid argument, a null pointer.

Consecutive Switch Labels Example

The C++ code fragment below uses a switch statement to convert error codes to strings.
message[EACCES] = "Permission denied"; message[ENODEV] = "No such device"; message[ENODEV] = "No such file or directory"; // Oops, should be ENOENT ... switch (errno) { case EACCES: case ENODEV: case ENOENT: printf("%s\n", message[errno]); break; ...

This program clearly anticipates three different errors. You can satisfy statement coverage with just one error, errno=EACCESS. Statement coverage says that testing with this error is just as good as another. However, this code incorrectly initializes message for ENODEV twice, but does not initialize message for ENOENT. Testing with either of these errors exposes the problem, but statement coverage does not call for them.

Loop Termination Decision Example


The C++ function below copies a string from one buffer to another.
char output[100]; for (int i = 0; i <= sizeof(output); i++) { // Oops, buffer overrun; comparison should be < output[i] = input[i]; if (input[i] == '\0') { break; } }

The main loop termination decision, i <= sizeof(output), intends to prevent overflowing the output buffer. You can achieve full statement coverage without testing this condition. The overflow decision correctly ought to use operator < rather than operator <=. You get full statement coverage of this code with any input string of length less than 100, without exposing the bug.

Do-While Loop Example


Consider the C++ function below, which initializes a string buffer with an optional input string.
void initString(char* output, const char* input = "") { int i = 0; do { output[i] = input[i]; } while (input[i] != '\0'); // Oops, loop variable not incremented }

You can achieve full statement coverage without repeating this loop. Testing with a zero-length input string is sufficient for statement coverage. The problem is the

programmer forgot to increment the index. Any non-zero length input string causes an infinite loop.

Sensitivity To Basic Block Length Example 1


The C++ if-else statement below contains a lot of code in the then-clause, but very little in the else-clause.
if (condition) { // 99 statements statement1; statement2; ... statement99; } else { // 1 statement statement100; }

With condition true, you obtain 99% statement coverage. With a successful test, you can conclude that 99% of the code has no bugs. In the reverse senario with condition false, you obtain just 1% statement coverage. Statement coverage seems to measure the relative importance of the two test cases proportionately.

Sensitivity To Basic Block Length Example 2


You can achieve 100% statement coverage of the C++ code fragment below with one test case, without exposing any bugs. The test case is { condition=true, errno=EACCES, input=""}. However, there are many other feasible paths through this code which expose one of the five bugs. Statement coverage indicates of the number of bugs very poorly.
int* p = NULL; if (condition) { p = &variable; *p = 1; } *p = 0; // Oops, possible null pointer dereference const char* string2 = NULL; if (condition || strcmp(string1, string2) == 0) // Oops, possible null pointer dereference statement; message[EACCES] = "Permission denied"; message[ENODEV] = "No such device"; message[ENODEV] = "No such file or directory"; // Oops, should be ENOENT switch (errno) { case EACCES: case ENODEV: case ENOENT: printf("%s\n", message[errno]); break; ... } char output[100]; for (int i = 0; i <= sizeof(output); i++) { // Oops, buffer overrun; comparison should be < output[i] = input[i]; if (input[i] == '\0') { break;

} } int i = 0; do { output[i] = input[i]; } while (input[i] != '\0'); // Oops, loop variable not incremented

What Others Say About Statement Coverage


Testing Computer Software by Cem Kaner, Hung Quoc Nguyen and Jack Falk (1999) compares statement coverage, branch coverage and condition coverage. The book says: Line coverage is the weakest measure. ... Although line coverage is more than some programmers do, it is not nearly enough. In the paper Software unit test coverage and adequacy (1997), the authors say: ... statement coverage is so weak that even some control transfers may be missed from an adequate test. Managing the Software Process by Watts S. Humphrey (1989) says: The simplest approach is to ensure that every statement is exercised at least once. A more stringent measure is to require coverage of every path within a program. ... A more practical measure is to exercise each condition for each decision statement at least once ... The paper Software Negligence and Testing Coverage by Cem Kaner (1996) discusses statement coverage, branch coverage and path coverage. He says: Line coverage measures the number / percentage of lines of code that have been executed. But some lines contain branches - the line tests a variable and does different things depending on the variable's value. Software Testing Techniques by Boris Beizer (1996) discusses path coverage, statement coverage and branch coverage. He says: [Statement coverage] is the weakest measure in the family [of structural coverage criteria]: testing less than this for new software is unconscionable ... Brian Marick, a noted expert and author on software testing, said in an e-mail to me: I'd rather use branch coverage, but if I can't - perhaps I don't have the source to instrument, ... line coverage is better than nothing.

Bullseye Testing Technology

Minimum Acceptable Code Coverage


This paper discusses how to decide what percentage of code coverage you need. By Steve Cornett. Copyright Bullseye Testing Technology 2006-2011. All rights reserved. Redistribution in whole or in part is prohibited without permission.

Summary
Code coverage of 70-80% is a reasonable goal for system test of most projects with most coverage metrics. Use a higher goal for projects specifically organized for high testability or that have high failure costs. Minimum code coverage for unit testing can be 10-20% higher than for system testing.

Introduction
Empirical studies of real projects found that increasing code coverage above 70-80% is time consuming and therefore leads to a relatively slow bug detection rate. Your goal should depend on the risk assessment and economics of the project. Consider the following factors.

Cost of failure. Raise your goal for safety-critical systems or where the cost of a failure is high, such as products for the medical or automotive industries, or widely deployed products. Resources. Lower your goal if testers are spread thin or inadequately trained. If your testers are unfamiliar with the application, they may not recognize a failure even if they cover the associated code. Testable design. Raise your goal if your system has special provisions for testing such as a method for directly accessing internal functionality, bypassing the user interface. Development cycle status. Lower your goal if you are maintaining a legacy system where the original design engineers are no longer available.

Many projects set no particular minimum percentage required code coverage. Instead they use code coverage analysis only to save time. Measuring code coverage can quickly find those areas overlooked during test planning. Defer choosing a code coverage goal until you have some measurements in hand. Before measurements are available, testers often overestimate their code coverage by 20-30%.

Full Coverage Generally Impractical


Although 100% code coverage may appear like a best possible effort, even 100% code coverage is estimated to only expose about half the faults in a system. Low

code coverage indicates inadequate testing, but high code coverage guarantees nothing. In a large system, achieving 100% code coverage is generally not cost effective. Some reasons are listed below.

Some test cases are expensive to reproduce but are highly improbable. The cost to benefit ratio does not justify repeating these tests simply to record the code coverage. Checks may exist for unexpected error conditions. Layers of code might obscure whether errors in low-level code propagate up to higher level code. An engineer might decide that handling all errors creates a more robust solution than tracing the possible errors. Unreachable code in the current version might become reachable in a future version. An engineer might address uncertainty about future development by investing a little more effort to add some capability that is not currently needed. Code shared among several projects is only partially utilized by the project under test.

Generally, the tester should stop increasing code coverage when the tests become contrived. When you focus more and more on making the coverage numbers better, your motivation shifts away from finding bugs.

Unit, Integration and System Testing


You can attain higher code coverage during unit testing than in integration testing or system testing. During unit testing, the tester has more facilities available, such as a debugger to manipulate data and conditional compilation to simulate error conditions. Likewise, higher code coverage is possible during integration testing than in system testing. During integration testing, the test harness often provides more precise control and capability than the system user interface. Therefore it makes sense to set progressively lower goals for unit testing, integration testing, and system testing. For example, 90% during unit testing, 80% during integration testing, and 70% during system testing.

Coverage Metrics
The information in this paper applies to code coverage metrics that consider control structures independently. Specifically, these are:

statement coverage (line coverage) basic block coverage decision coverage (branch coverage) condition/decision coverage modified condition/decision coverage (MCDC)

Although some of these metrics are less sensitive to control flow than others, they all correlate statistically at a large scale.

Formal Standards
DO-178B
The aviation standard DO-178B requires 100% code coverage for safety critical systems. This standard specifies progressively more sensitive code coverage metrics for more critical systems. Effect of System Level Example Failure Catastrophic Hazardous Major Minor No effect A B C D E Crash Passenger fatality Passenger injury Entertainment system failure Required Code Coverage 100% modified coverage and coverage condition/decision 100% statement

100% decision coverage and 100% statement coverage 100% statement coverage

Flight plan change No code coverage requirement No code coverage requirement

These requirements consider neither the probability of a failure nor the cost of performing test cases.

IEC 61508
The standard IEC 61508:2010 "Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems" recommends 100% code coverage of several metrics, but the strenuousness of the recommendation relates to the criticality. Safety Integrity Level 100% points entry 100% statements Recommended Highly Recommended Highly Recommended Highly Recommended 100% conditions, MC/DC Recommended Recommended Recommended Highly Recommended

100% branches Recommended Recommended Highly Recommended Highly Recommended

1 (least Highly critical) Recommended 2 3 Highly Recommended Highly Recommended

4 (most Highly critical) Recommended

The safety integrity level (SIL) relates to the probability of unsafe failure. Determining the SIL involves a lengthy risk analysis. This standard recommends but does not require 100% coverage. It specifies you should explain any uncovered code. This standard does not define the coverage metrics and does not distinguish between condition coverage and MC/DC.

ANSI/IEEE 1008-1987
The IEEE Standard for Software Unit Testing section 3.1.2 specifies 100% statement coverage as a completeness requirement. Section A9 recommends 100% branch coverage for code that is critical or has inadequate requirements specification. Although this document is quite old, it was reaffirmed in 2002.

FDA requirements for Medical Devices


The United States Federal Drug Administration FDA document General Principles of Software Validation recommends structural testing but does not specify any particular code coverage requirement. The document says only: The amount of structural coverage should be commensurate with the level of risk posed by the software.

References
Efficient use of code coverage in large-scale software development, Yong Woo Kim, 2003 Code coverage, what does it mean in terms of quality? Williams et al, 2001 An Empirical Study of the Branch Coverage of Different Fault Classes, Melissa Cline and Linda Werner, 1994 Coverage measurement experience during function test, Paul Piwowarski et al, 1993 How Do You Know When You Are Done Testing?, Richard Bender, 2000 How to Misuse Code Coverage, Brian Marick, 1997 Comparing the Effectiveness of Software Testing Strategies, Basili and Selby, 1987 DO-178B, Software Considerations in Airborne Systems and Equipment Certification, 1992 Understanding the Use, Misuse and Abuse of Safety Integrity Levels, Felix Redmil, 2000

You might also like