You are on page 1of 19

A SECOND REPLICATED

QUANTITATIVE ANALYSIS
OF FAULT
DISTRIBUTIONS IN
COMPLEX SOFTWARE

Tihana Galinac Grbac,Per


runeson,Darko Huljenic

INTRODUCTION
Software Engineering
Importance of replication
Pareto Principle of fault distributions
Effects of difference in time on hypotheses

PARETO PRINCIPLE
80% of effects due to 20% of causes

BACKGROUND
Hypothesis analyzed in four groups
Related to Pareto principle of fault distribution
Related to persistence of faults
About effects of module size and complexity on fault proneness
About the quality in terms of fault densities

CONTEXT OF STUDY
Ericssons Product
Empirical data from five projects
Sequential releases of complex large scale telecommunication product
Analyzed part application part
Written in Programming Language for Exchanges (PLEX)

TESTING ACTIVITIES
Function Test Performed locally
System Test Performed by System Integration and Verification Center
Site Test Performed by Network Integration and Verification Organization
Operation Failures during product operations

DATA COLLECTION
Passively collect data from several resources
Information about modules quality reports
Information for each module
o Module name
o Identity and Revision
o Modified and Total size of code
o Number of faults during unit verification
Trouble Reports

DATA ANALYSIS AND


RESULTS
Analysis of hypothesis done
Results for each group of hypothesis discussed
Relation to other studies elaborated

TERMINOLOGIES
Rel n,Rel n+1,Rel n+2,Rel n+3,Rel n+4 - Projects during sequential
releases
Number of units

Number of faults
Type of study Original, Previous replicated study, This replicated study

HYPOTHESES RELATED TO
PARETO PRINCIPLE
Hypothesis A small number of modules contain most of the faults detected
during prerelease testing

Figure 1 - Modules vs % of prerelease faults

PARETO PRINCIPLE
HYPOTHESIS 2
Hypothesis If a small number of modules contain most of the post
release faults, then it is because these modules constitute most of the code
size.
100 % of post release faults modules constituting 50,88,92,50 and 88 %
of system size
80 % of faults 26,39,43,28 and 22% of system size

HYPOTHESIS RELATED TO
PERSISTENCE OF FAULTS
Hypothesis Higher incidence of faults in FT implies higher incidence of
faults in ST
Scatter plots relation of FT faults and ST faults

Pearson coefficient correlation r = 0.86,0.82,0.96,0.83,0.94 indicates


strong correlations

HYPOTHESIS ABOUT
EFFECTS OF MODULE SIZE
Hypothesis that failed
1. Smaller modules are less likely to be failure-prone than larger ones - No
correlation between total number of faults and total volume

2. Size metrics are good predictors of pre release faults in a module


Correlation coefficient of LOC vs pre release faults are low
3. Size metrics are good predictors of post release faults in a module
Scatter plots of LOC vs post release faults does not reveal anything

4. Size metrics are good predictors of a modules prerelease fault density


Linear relationship between size and fault count not observable

HYPOTHESES ABOUT
QUALITY IN TERMS OF FAULT
DENSITY
Hypothesis Fault densities at corresponding phases of testing and
operation remain roughly constant between subsequent major releases of
software system
Fault densities = Total number of faults/Total volume of code
Fault densities approximately remain same
Consistent results indicates process is stable and repeatable
Fault densities decrease as system matures

STRENGTHS
Real time experiments
Hypothesis based on general metrics
Most hypothesis turn out to be true
Data analyzed in detail

WEAKNESS
Size-related predictors are not good enough
All factors not considered while calculating fault densities
All hypotheses related to module-size failed
Programming languages not considered

QUESTIONS?

You might also like