You are on page 1of 61

Software Measurement

UCLA Computer Science Department CS130 Winter, 2002

Reference
Material in this lecture is taken from chapters 1-3 of Software Metrics: A Rigorous and Practical Approach (2nd ed.), Norman E. Fenton and Shari Lawrence Pfleeger, 1997, PWS Publishing Company, Boston, MA, ISBN 0534954251
2

Overview
1. Measurement what is it and why do we do it? 2. Measurement basics 3. A goal-based software measurement framework

Measurement What Is It and Why Do We Do It?


1. Measurement in Everyday Life 2. Measurement in Software Engineering 3. The Scope of Software Metrics

Measurement in Everyday Life


Measurement governs many aspects of everyday life:
Economic indicators determine prices, pay raises Medical system measurements enable diagnosis of specific illnesses Measurements in atmospheric systems are the basis of weather prediction
5

Measurement in Everyday Life


How do we use measurement in our lives?
In a shop, price is a measure of the value of an item, and we calculate the bill to make sure we get the correct change. Height and size measurements ensure clothing will fit correctly. When traveling, we calculate distance, choose a route, measure speed, and predict when well arrive

Measurement helps us to:


Understand our world Interact with our surroundings Improve our lives

Measurement in Everyday Life


What is Measurement?
Common thread in previous examples some aspect of a thing is assigned a descriptor that allows us to compare it with other things. More formally the process by which
Numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them. According to clearly defined rules.
7

Measurement in Everyday Life


Definition of measurement process is far from clear cut. To understand measurement, must ask questions that are difficult to answer:
In a room with blue walls, is blue a measure of the color of the room? A persons height is a commonly understood attribute that can be easily measured. What about other attributes of people, such as intelligence? Some measurements (e.g., intelligence, wine quality) may have wide error margins is this a reason to reject them? How do we decide which error margins are acceptable and which are not? When is a measurement scale acceptable for the purpose to which it is put (e.g., is it appropriate to measure a persons height in kilometers)?

What types of manipulations can we apply to the results of measurement?

Material in next section (Measurement Basics) will allow us to answer these questions.
8

Measurement in Everyday Life


Making Things Measurable
What is not measurable, make measurable (Galileo Galilei)
One aim of science is to find ways of measuring attributes of things were interested in. Measurement makes concepts more visible, therefore more understandable and controllable. Attributes previously thought to be unmeasurable now form basis for decisions affecting our lives (e.g., air quality, inflation index).

Measuring the unmeasurable improves understanding of particular entities, attributes


Act of proposing a particular measure can open discussion that will lead to greater understanding Making new measurement may requiring modifying environment or practices (e.g., using a new tool, adding a step in a process)
9

Measurement in Everyday Life


Measurement in Software Engineering
In many instances, measurement is considered a luxury. For many projects:
Measurable targets are not set (e.g., products are supposed to be user-friendly, reliable, and maintainable, but we dont quantify what that means). The component costs of projects are not quantified or understood. Product quality is not quantified. Too much reliance on anecdotal evidence (e.g., try our product and youll improve your productivity by 50%!). Most of the time, theres no measurable basis for the claims.
10

Measurement in Everyday Life


Measurement in Software Engineering (contd)
When measurements are made, they tend to be:
Incomplete Inconsistent Infrequent

Most of the time, were not told anything about:


How experiments were designed What was measured and how Realistic error margins Without this information, cant decide whether to apply results to a development effort, and cant do an objective study to repeat the measurements.
11

Lack of measurement in SW engineering is compounded by lack of a rigorous approach.

Measurement in Everyday Life


Software Measurement Objectives
Assessing status
Projects Products for a specific project or projects Processes Resources

Identifying trends
Need to be able to differentiate between a healthy project and one thats in trouble

Determine corrective action


Measurements should indicate the appropriate corrective action, if any is required.
12

Measurement in Everyday Life


Types of information required to understand, control, and improve projects:
Managers
What does the process cost? How productive is the staff? How good is the code? Will the customer/user be satisfied? How can we improve? Are the requirements testable? Have all the faults been found? Have the product or process goals been met? What will happen in the future?
13

Engineers

Measurement in Everyday Life


The Scope of Software Metrics
Cost and effort estimation Productivity measures and models Data collection Quality models and measures Reliability models Performance evaluation and models Structural and complexity metrics Capability-maturity assessment Management by metrics Evaluation of methods and tools

14

Measurement in Everyday Life


The Scope of Software Metrics some details
Cost and effort estimation
Motivation accurately predict costs early in the development life cycle. Numerous empirical cost models have been developed
COCOMO, COCOMO 2 Putnams model (see Pressman Ch 3) ...
15

Measurement in Everyday Life


The Scope of Software Metrics some details
Productivity models and measures
Estimate staff productivity to determine how much specified changes will cost Naive measure size divided by effort. Doesnt take into account things like defects, functionality, reliability. More comprehensive models have been developed next slide illustrates a possible model.

16

Measurement in Everyday Life


The Scope of Software Metrics some details
Possible productivity model
Productivity

Value Personnel Quality Quantity Time Reliability Defects Size Functionality Money HW

Cost

Resources

Complexity Env Cnstrst

Problem difficulty 17

SW

Measurement in Everyday Life


The Scope of Software Metrics some details
Software quality model
Use Factor Usability Product Operation Reliability Criteria
Communicativeness Accuracy Consistency Device Efficiency Accessibility

Efficiency
Reusability

Metrics
Completeness Structuredness Conciseness

Product Revision

Maintainability Portability Testability

Device Independence
Legibility Self-descriptiveness Traceability

18

Overview
1. Measurement what is it and why do we do it? 2. Measurement basics 3. A goal-based software measurement framework

19

Measurement Basics
1. Overview 2. The representational theory of measurement 3. Measurement and models 4. Measurement scales and scale types 5. Meaningfulness in measurement
20

Measurement Basics
Overview
Understanding of software attributes not as deep as understanding of non-software entities (e.g., length, weight, temperature) Questions that are relatively easy to answer for non-software entities are difficult for software:
How much must we know about an attribute before its reasonable to consider measuring (e.g., program complexity)? How do we know if weve really measured the attribute we want to measure? Does a count of the number of defects found in a system measure its quality, or does it measure something else? Using measurement, what meaningful statements can we make about an attribute and the entities that possess it (e.g., can we talk about doubling a designs quality)? What meaning operations can we perform on measures (e.g., can we compute the average productivity of a group of developers, or the average quality of a set of modules)?

Answering these questions requires developing a theory of measurement


21

Measurement Basics
The representational theory of measurement
Developed as a classical discipline from the physical sciences Provides rules for:
Making consistent measurements Interpreting data resulting from measurement

Representational theory of measurement formalizes intuition about the way the world works.

22

Measurement Basics
Empirical relations
Data obtained as measures should represent attributes of observed entities Manipulating data should preserve observed relationships Example Taller than
Binary relation defined on the set of pairs of people. Either
A is taller than B, or B is taller than A

Empirical relations are not restricted to binary relations can be unary (e.g., A is tall), ternary (A sitting on Bs shoulders is taller than C), etc.
23

Measurement Basics
Empirical relations (contd)
Empirical relations are mappings from the empirical, real world to a formal mathematical world.
Height maps a set of people to the set of real numbers Greater functionality (from survey results)
A A B C D B C D 80% 10% 80% 5% 50% 96% x has greater functionality than y if (x,y) > 60%. Relation is (C,A), (C,B), (C,D), (A,B), (A,D). Surveys can help gain preliminary understanding of relationships.
24

20% -

90% 95% 20% 50% 4%

Measurement Basics
Empirical relations (contd)
Definitions
Measurement a mapping from the empirical world to the formal, relational world. Measure number or symbol assigned to an entity by the mapping in order to characterize an attribute.

25

Measurement Basics
Rules of Mapping
Measures must specify domain and range as well as the rule for performing the mapping
Domain real world is domain of mapping that defines the measurement Range the mathematical world into which real-world attributes are mapped

Examples
Measuring height:
Is height measured in inches, centimeters, feet? Are people measured sitting or standing? Are shoes allowed to be worn during the measurement?

Measuring lines of code


Are lines of code reused without change counted? Are non-executable lines counted? Declarations Compiler Directives Comments Blank lines

26

Measurement Basics
The representation condition
Behavior of measures in number system needs to be the same as corresponding elements in the real world. Formally, a measurement mapping M must map entities into numbers and empirical relations into numerical relations in such a way that:
Empirical relations preserve numerical relations Empirical relations are preserved by numerical relations

27

Measurement Basics
The representation condition example
Taller than:
A is taller than B iff M(A) > M(B), where M is a mapping from the empirical world to the real numbers.
Whenever Joe is taller than Frank, then M(Joe) must be a bigger number than M(Frank) Jane can be mapped to a bigger number than John only if Jane is taller than John.

28

Measurement Basics
The representation condition example 2
Software failures criticality Three types of failures examined:
Delayed response Incorrect output Data loss At this point, we have a relation system consisting of 3 unary relations
R1 for delayed response R2 for incorrect output R3 for data loss

With this information, we cant yet judge the relative criticality of these types of failures.
29

Measurement Basics
The representation condition example 2 (contd)
We can find a representation in the set of real numbers by choosing three distinct numbers:
M(delayed response) = 6 M(incorrect output)=4 M(data loss)=50

Further investigation of criticality reveals that data loss is more critical than incorrect output, which in turn is more critical than a delayed response. To develop a real-number representation for this enriched relation, we must be more careful in assigning numbers. Using > to mean more critical than, data-loss failures must be mapped to a higher number than incorrect output failures, which in turn must mapped to a higher number than delayed responses.
30

Measurement Basics
The representation condition (contd)
There may be many different measures for a given attribute (e.g., in., cm., furlongs).
Any measure satisfying the representation condition is a valid measurement

The richer the empirical relation system, the fewer the valid valid measures
Relational systems are rich if they have a large number of relations that can be defined. As the number of empirical relations increases, so does the number of conditions a measurement mapping must satisfy in its representation condition.
31

Measurement Basics
Measurement and models
Model an abstraction of reality allowing us to:
Strip away unnecessary detail View an entity or concept from a particular perspective

Representation condition requires every measure to be associated with a model of how the measure maps real world entities and attributes to elements of a numerical system. These models are essential in:
Understanding how measure is derived Interpreting behavior of numerical elements when we return to the real world.
32

Measurement Basics
Defining Attributes
Always a temptation to focus too much on formal, mathematical system, rather than on empirical system. Before we set out to measure something (e.g., program complexity), we need to:
Identify a set of characteristics of the thing were trying to measure A model that associates the characteristics

We can then define measures for each characteristic, and use the representation condition to help understand the relationships.
33

Measurement Basics
Direct and Indirect Measurement
Direct measure relates an attribute to a number or symbol without reference to no other object or attribute (e.g., height). Indirect measure
Used when an attribute must be measured by combining several of its aspects (e.g., density) Requires a model of how measures are related to each other
34

Measurement Basics
Direct and Indirect Measures for Software examples
Direct
Length or source code (lines of code) Duration of testing process Number of defects discovered during test Time a developer spends on a project

Indirect
Programmer productivity (LOC/workmonths of effort) Module defect density (number of defects/module size) Defect detection efficiency (# defects detected/total defects) Requirements stability (initial # requirements/total # requirements) Test effectiveness ratio (number of items covered/total number of items) System spoilage (effort spent fixing faults/total project effort)
35

Measurement Basics
Measurement for prediction
So far weve talked about measuring some entity that already exists
Useful for assessing current situation or understanding what has happened in the past

In many cases, we want to predict an attribute of an entity that doesnt yet exist (e.g., project cost, reliability of fielded system).
Requires model relating measurement that can be taken now to attributes that will be predicted
Empirical cost models Software reliability models

Model is not sufficient by itself to perform required prediction. Need a prediction system including:
A model relating the measurements to the desired attribute A procedure to model parameters Procedures for interpreting model results
36

Measurement Basics
Measurement for prediction
Accurate predictive measurement is always based on measurement in the assessment sense Everyone wants to predict key determinants of success (e.g., effort to build a new system, operational reliability), but... There are no magic models. They all depend on:
High-quality measurements of past projects High-quality measurements of current project

37

Measurement Basics
Measurement scales and scale types
A measurement scale is our mapping, M, together with the empirical and numerical relation systems.
If the relation systems (domain and range) are obvious from context, sometimes M alone is referred to as the scale.

Three important questions concerning representations and scales:


How do we determine when one numerical relation system is preferable to another? How do we know if a particular empirical relation system has a representation in a given numerical relation system? What do we do when we have several different possible representations (and hence many scales) in the same numerical relation system?
38

Measurement Basics
Measurement scales and scale types (contd)
Three questions:
How do we determine when one numerical relation system is preferable to another?
Answer: We can map the scale to a symbolic relational system. In practice, this can be unwieldy (symbolic vs. numerical manipulation). We try to use real numbers whenever possible.

How do we know if a particular empirical relation system has a representation in a given numerical relation system?
Answer: This is known as the representation problem, one of the basic problems of measurement theory. This is a solved problem for various types of relation systems characterized by specific axioms. Discussion is beyond the scope of this course, but solutions can be found in texts on measurement theory.

What do we do when we have several different possible representations (and hence many scales) in the same numerical relation system?
Answer: This is the uniqueness problem. Following slides address this question.

39

Measurement Basics
Measurement scale types
Nominal Ordinal Interval Ratio Absolute

One relational system is richer than another if all relationships in the second system are contained in the first.
Scale types above are listed in order of increasing richness.

40

Measurement Basics
Measurement scale types (contd) Why is this important?
If we have a satisfactory measure for an attribute with respect to an empirical relation system, we want to know what other measures exist that are acceptable. Mapping from one acceptable measure to another is called an admissible transformation.
Example when considering length, admissible transformations are of the form M=aM. Transformations of the form M=b+aM, or M=aMb are not acceptable when b <> 0.

The more restrictive the class of admissible transformations, the most sophisticated the measurement scale.
41

Measurement Basics
Nominal scale
Most primitive form of measurement define classes or categories, and place each category in a particular class or category Two major characteristics
Empirical relation consists only of different classes no notion of ordering Any distinct number or symbolic representation is an acceptable measure no notion of magnitude associated with numbers or symbols.

Any two mappings, M and M, will be related to each other in that M can be obtained from M by a one-to-one mapping Example software faults can belong to one of the following classes, according to where they were first introduced during development:
Specification Design Code
42

Measurement Basics
Measurement types and scale
Ordinal scale
Augments nominal scale with ordering information. Three major characteristics
Empirical relation system consists of classes that are ordered with respect to the attribute Any mapping preserving the ordering (i.e., a monotonic function) is acceptable Numbers represent ranking only, so arithmetic operations have no meaning

Set of admissible transformations is set of all monotonic mappings Example software complexity two valid measures
Value 1 Trivial Meaning Value Meaning

2
4 6 9 12

Trivial
Simple Moderate Complex Incomprehensible

2
3 4 5

Simple
Moderate Complex Incomprehensible

43

Measurement type and scale


Interval scale

Measurement Basics

Captures information about size of intervals that separate classes. Three characteristics Preserves order Preserves differences, but not ratios Addition and subtraction are acceptable, but not multiplication and division Class of admissible transformations is the set of affine transformations: M=aM+b, where a>0. Example software complexity suppose the difference in complexity between a trivial and a simple system is the same as that between a simple and a moderate system. Where this equal step applies to each class, we have an attribute measurable on an interval scale.
Meaning Trivial Simple Moderate Complex 0 2 4 Value Meaning Trivial Simple Moderate Value 1.1 2.2 3.3 Meaning Trivial Simple Moderate

Value 1 2 3 4

6
8

Complex
Incomprehensible

4.4
5.5

Complex
Incomprehensible

Incomprehensible

44

Measurement Basics
Measurement type and scale
Ratio scale
Most useful scale, common in physical sciences captures information about ratios 4 characteristics
Preserves ordering, size of intervals between entities, and ratios between entities There is a zero element, representing total lack of the attribute Measurement mapping must start at 0 and increase at equal intervals (units) All arithmetic can be meaningfully applied to classes in the range of the mapping.

Acceptable transformations are ratio transformations M=aM, where a is a scalar. Example program length can be measured by lines of code, number of characters, etc. Number of characters may be obtained by multiplying the number of lines by the average number of characters per line.
45

Measurement Basics
Measurement type and scale
Absolute scale
Most restrictive in terms of admissible transformations For any two measures, M and M, theres only one admissible transformation (identity transformation), since theres only one way to make the measurement. 4 characteristics
Measurement is made simply by counting the number of elements in the entity set. Attribute always takes the form of number of occurrences of x in the entity Only one possible measurement mapping, namely the actual count All arithmetic analysis of the resulting count is meaningful.

Example lines of code in a module is an absolute scale measure.


46

Measurement Basics
Measurement type and scale - summary
Scale type Admissible transformations 1-1 mapping Examples

Nominal

Labeling, classifying entities Preference, hardness, air quality, intelligence tests (raw scores) Relative time, temperature (Fahrenheit, Celsius), intelligence tests (standardized scores) Time interval, length, temperature (Kelvin) Counting entities 47

Ordinal

Monotonic increasing function M=aM+b, a >0

Interval

Ratio

M=aM, a> 0

Absolute

M=M

Measurement Basics
Meaningfulness in measurement
After making measurements, key question is can we deduce meaningful statements about entities being measured? Harder to answer than it first appears consider these statements:
1. The number of errors discovered during the integration testing of a program X was at least 100 2. The cost of fixing each error in program X is at least 100 3. A semantic error takes twice as long to fix as a syntactic error 4. A semantic error is twice as complex as a syntactic error
48

Measurement Basics
Meaningfulness in measurement (contd)
First statement seems to make sense Second statement doesnt make sense number of errors may be specified without reference to a particular scale, but cost to fix them must be Statement 3 seems sensible the ratio of time taken is the same, whether time is measured in second, hours, or fortnights Statement 4 does not appear to be meaningful and requires clarification:
If complexity means time to understand the error, than it makes sense Other definitions of complexity may not admit measurement on a ratio scale (e.g. examples in previous slides) in which case statement 4 is meaningless.
49

Measurement Basics
Meaningfulness in measurement
Definition: a statement involving measurement is meaningful if its truth value is invariant of transformations of allowable scales.

50

Measurement Basics
Meaningfulness in measurement examples
John is twice as tall as Fred
Implies measures are at least on the ratio scale. Its meaningful because no matter what transformation we use (and all we have is ratio transformations), the truth or falsity of the statement remains constant.

Temperature in Tokyo today is twice that in London


Implies a ratio scale, but is not meaningful. We measure in F and C. If Tokyo is 40 C and London is 20 C, then the statement is true, but if Tokyo is 104 F and London is 68 F, the statement is no longer true.

Failure x is twice as critical as failure y


Not meaningful if we only have an ordinal scale for criticality (common scale for software failures is catastrophic, significant, moderate, minor, and insignificant).
51

Measurement Basics
Meaningfulness in measurement
Note that our notion of meaningfulness says nothing about
Usefulness Practicality Worthwhile Ease of measurement

52

Measurement Basics
Statistical operations on measures
Analyses dont have to be sophisticated, but we want to know something about how a set of data is distributed. What types of statistical analysis are relevant to a given measurement scale?
Scale type Nominal Ordinal Interval Defining relations Equivalence Equivalence, Greater than Equivalence, Greater than, Known ratio of any intervals Equivalence, Greater than, Known ratio of any intervals, Known ratio of any two scale values Examples of appropriate statistics Mode, Frequency Median, Percentile, Spearman r, Kendall r, Kendall W Mean, Standard deviation, Pearson product-moment correlation, Multiple product-moment correlation Geometric mean, Coefficient of variation

Ratio

53

Measurement Basics
Indirect measurement and meaningfulness
Done when measuring a complex attribute in terms of simpler sub-attributes Scale type for an indirect measure M is generally no stronger than the weakest of the scale types of the sub-attributes
Example testing efficiency=defects/effort
Defects is on the absolute scale, while effort is on the ratio scale. Therefore effort is on the ratio scale. What is E=2.7v+121w+26x+12y+22z-497, where v is the number of program instructions x and y are the number of internal and external documents z is the program size in words w is a subjective measure of complexity
54

Overview
1. Measurement what is it and why do we do it? 2. Measurement basics 3. A goal-based software measurement framework

55

A Goal-Based Software Measurement Framework


1. Classifying software measures 2. Determining what to measure

56

A Goal-Based Software Measurement Framework


Classifying software measures
Three types of software entities to measure
Processes collections of software related activities Products Resources entities required by a process activity

Within each class, we have


Internal attributes measured purely in terms of the entity itself External attributes measured with respect to how entity relates to its environment. Behavior of the entity is important

Managers want to be able to measure and predict external attributes


However, external attributes are more difficult to measure than internal ones, and are measured late in the development process Desire is to predict external attributes in terms of more easily-measured internal attributes 57

A Goal-Based Software Measurement Framework


Determining what to measure
Measurement is useful only if it helps understand the underlying process or one of its resultant products Goal-Question-Metric (GQM) has been proven to be effective in selecting and implementing metrics
List the major goals of the development project Derive from each goal the questions that must be answered to determine if goals are being met Decide what must be measured in order to be able to answer the questions adequately

58

A Goal-Based Software Measurement Framework


GQM example goal is to evaluate effectiveness of coding standard
Goal Goal Questions Who is using standard? What is coder productivity? What is code quality?

Metrics

Proportion of coders Using standard Using language

Experience of coders With standard With language With environment, etc.

Code size (lines of code, function points, etc

Effort

Errors

59

A Goal-Based Software Measurement Framework


GQM example 2 AT&T goals, questions, metrics
Goal Plan Questions How much does the inspection process cost? How much calendar time does the inspection process take? Monitor and control What is the quality of the inspected software? To what degree did the staff conform to the procedures? What is the status of the inspection process? Metrics Average effort per KLOC Percentage of reinspections Average effort per KLOC Total KLOC inspected Average faults detected per KLOC Average inspection rate Average preparation rate Average inspection rate Average preparation rate Average lines of code inspected Total KLOC inspected Defect removal efficiency Average number of faults detected per KLOC Average inspection rate Average preparation rate Average lines of code inspected Average effort per fault detected Average inspection rate Average preparation rate Average lines of code inspected 60

Improve

How effective is the inspection process?

What is the productivity of the inspection process?

A Goal-Based Software Measurement Framework


Templates for goal definition
Purpose to (characterize, evaluate, predict, motivate, etc.) the (process, product, model, metric, etc.) in order to (understand, assess, manage, engineer, learn, improve, etc.) it.
Example To evaluate the maintenance process in order to improve it.

Perspective Examine the (cost, effectiveness, correctness, defects, changes, product measures, etc.) from the viewpoint of the (developer, manager, customer, user, etc.)
Example Examine the cost from the viewpoint of the manager

Environment The environment consists mainly of the following: process factors, people factors, problem factors, methods, tools, constraints, etc.
Example the maintenance staff are poorly motivated programmers who have limited access to tools.
61

You might also like