You are on page 1of 42

Performance metrics

What is a performance metric? Characteristics of good metrics Standard processor and system metrics Speedup and relative change

Copyright 2004 David J. Lilja

What is a performance metric?

Count

Of how many times an event occurs Of a time interval Of some parameter

Duration

Size

A value derived from these fundamental measurements


Copyright 2004 David J. Lilja 2

Time-normalized metrics

Rate metrics

Normalize metric to common time basis


Transactions per second Bytes per second

(Number of events) (time interval over which events occurred)

Throughput Useful for comparing measurements over different time intervals


Copyright 2004 David J. Lilja 3

What makes a good metric?


Allows accurate and detailed comparisons Leads to correct conclusions Is well understood by everyone Has a quantitative basis A good metric helps avoid erroneous conclusions

Copyright 2004 David J. Lilja

Good metrics are

Linear

Fits with our intuition If metric increases 2x, performance should increase 2x Not an absolute requirement, but very appealing

dB scale to measure sound is nonlinear

Copyright 2004 David J. Lilja

Good metrics are

Reliable

If metric A > metric B

Then, Performance A > Performance B

Seems obvious! However,

MIPS(A) > MIPS(B), but Execution time (A) > Execution time (B)

Copyright 2004 David J. Lilja

Good metrics are

Repeatable

Same value is measured each time an experiment is performed Must be deterministic Seems obvious, but not always true

E.g. Time-sharing changes measured execution time

Copyright 2004 David J. Lilja

Good metrics are

Easy to use

No one will use it if it is hard to measure Hard to measure/derive

less likely to be measured correctly

A wrong value is worse than a bad metric!

Copyright 2004 David J. Lilja

Good metrics are

Consistent

Units and definition are constant across systems Seems obvious, but often not true

E.g. MIPS, MFLOPS

Inconsistent impossible to make comparisons

Copyright 2004 David J. Lilja

Good metrics are

Independent

A lot of $$$ riding on performance results Pressure on manufacturers to optimize for a particular metric Pressure to influence definition of a metric But a good metric is independent of this pressure

Copyright 2004 David J. Lilja

10

Good metrics are


Linear -- nice, but not necessary Reliable -- required Repeatable -- required Easy to use -- nice, but not necessary Consistent -- required Independent -- required

Copyright 2004 David J. Lilja

11

Clock rate

Faster clock == higher performance

1 GHz processor always better than 2 GHz

But is it a proportional increase? What about architectural differences?

Actual operations performed per cycle Clocks per instruction (CPI) Penalty on branches due to pipeline depth

What if the processor is not the bottleneck?

Memory and I/O delays


Copyright 2004 David J. Lilja 12

Clock rate

(Faster clock) (better performance) A good first-order metric

Linear Reliable Repeatable Easy to measure Consistent Independent


13

Copyright 2004 David J. Lilja

MIPS

Measure of computation speed Millions of instructions executed per second MIPS = n / (Te * 1000000)

n = number of instructions Te = execution time Distance traveled per unit time

Physical analog

Copyright 2004 David J. Lilja

14

MIPS

But how much actual computation per instruction?

Linear Reliable Repeatable Easy to measure Consistent Independent

E.g. CISC vs. RISC Clocks per instruction (CPI)

MIPS = Meaningless Indicator of Performance


15

Copyright 2004 David J. Lilja

MFLOPS

Better definition of distance traveled 1 unit of computation (~distance) 1 floatingpoint operation Millions of floating-point ops per second MFLOPS = f / (Te * 1000000)

f = number of floating-point instructions Te = execution time

GFLOPS, TFLOPS,
Copyright 2004 David J. Lilja 16

MFLOPS

Integer program = 0 MFLOPS

Linear Reliable Repeatable Easy to measure Consistent Independent

But still doing useful work Sorting, searching, etc. E.g. transcendental ops, roots

How to count a FLOP?

Too much flexibility in definition of a FLOP Not consistent across machines


Copyright 2004 David J. Lilja

17

SPEC

1. 2. 3.

System Performance Evaluation Coop Computer manufacturers select representative programs for benchmark suite Standardized methodology
Measure execution times Normalize to standard basis machine SPECmark = geometric mean of normalized values
Copyright 2004 David J. Lilja 18

SPEC

Geometric mean is inappropriate (more later) SPEC rating does not correspond to execution times of non-SPEC programs Subject to tinkering

Linear Reliable Repeatable Easy to measure Consistent Independent

Committee determines which programs should be part of the suite Targeted compiler optimizations
Copyright 2004 David J. Lilja

19

QUIPS

Developed as part of HINT benchmark Instead of effort expended, use quality of solution Quality has rigorous mathematical definition QUIPS = Quality Improvements Per Second

Copyright 2004 David J. Lilja

20

QUIPS

HINT benchmark program

Find upper and lower rational bounds for 1 (1 x ) 0 (1 x)dx Divide x,y into intervals that are power of 2 Count number of squares completely above/below curve to obtain u and l bounds
Copyright 2004 David J. Lilja 21

Use interval subdivision technique


Quality = 1/(u l)

QUIPS

From: http://hint.byu.edu/documentation/Gus/HINT/ComputerPerformance.html
Copyright 2004 David J. Lilja 22

QUIPS

From: http://hint.byu.edu/documentation/Gus/ HINT/ComputerPerformance.html Copyright 2004 David J. Lilja

23

QUIPS

Primary focus on

Linear Reliable Repeatable Easy to measure Consistent Independent

Floating-point ops Memory performance

Good for predicting performance on numerical programs But it ignores



24

I/O performance Multiprograming Instruction cache

Quality definition is fixed


Copyright 2004 David J. Lilja

Execution time

Ultimately interested in time required to execute your program Smallest total execution time == highest performance Compare times directly Derive appropriate rates Time == fundamental metric of performance

If you cant measure time, you dont know anything


Copyright 2004 David J. Lilja 25

Execution time

Stopwatch measured execution time


Start_count = read_timer(); Portion of program to be measured Stop_count = read_timer(); Elapsed_time = (stop_count start_count) * clock_period;

Measures wall clock time

Includes I/O waits, time-sharing, OS overhead,

CPU time -- include only processor time


Copyright 2004 David J. Lilja 26

Execution time

Best to report both wall clock and CPU times Includes system noise effects

Linear Reliable Repeatable Easy to measure Consistent Independent

Background OS tasks Virtual to physical page mapping Random cache mapping and replacement Variable system load


27

Report both mean and variance (more later)


Copyright 2004 David J. Lilja

Performance metrics summary


Clock Linear Reliable Repeatable Easy to measure Consistent Independent MIPS MFLOPS SPEC QUIPS TIME


28

Copyright 2004 David J. Lilja

Other metrics

Response time

Elapsed time from request to response Jobs, operations completed per unit time E.g. video frames per second Bits per second

Throughput

Bandwidth

Ad hoc metrics

Defined for a specific need


Copyright 2004 David J. Lilja 29

Means vs. ends metrics

Means-based metrics

Measure what was done Whether or not it was useful!

Nop instructions, multiply by zero,

Produces unreliable metrics Measures progress towards a goal Only counts what is actually accomplished
Copyright 2004 David J. Lilja 30

Ends-based metrics

Means vs. ends metrics


Means-based Ends-based

Copyright 2004 David J. Lilja

Execution time

Clock rate

MFLOPS

MIPS

QUIPS

SPEC

31

Speedup

Speedup of System 1 w.r.t System 2


S2,1 such that: R2 = S2,1 R1 R1 = speed of System 1 R2 = speed of System 2

System 2 is S2,1 times faster than System 1

Copyright 2004 David J. Lilja

32

Speedup

Speed is a rate metric

Ri = Di / T i Di ~ distance traveled by System i

Run the same benchmark program on both systems

D1 = D2 = D

Total work done by each system is defined to be execution of the same program

Independent of architectural differences


Copyright 2004 David J. Lilja 33

Speedup

R2 D2 / T2 D / T2 S 2,1 R1 D1 / T1 D / T1 T1 T2
Copyright 2004 David J. Lilja 34

Speedup

S 2,1 1 System 2 is faster tha n System1 S 2,1 1 System 2 is slower tha n System1

Copyright 2004 David J. Lilja

35

Relative change

Performance of System 2 relative to System 1 as percent change

R2 R1 2,1 R1
Copyright 2004 David J. Lilja 36

Relative change

R2 R1 D / T2 D / T1 2,1 R1 D / T1 T1 T2 T2 S 2,1 1
Copyright 2004 David J. Lilja 37

Relative change

2,1 0 System 2 is faster tha n System1 2,1 0 System 2 is slower tha n System1

Copyright 2004 David J. Lilja

38

Important Points

Metrics can be

Counts Durations Sizes Some combination of the above

Copyright 2004 David J. Lilja

39

Important Points

Good metrics are


Linear

More natural
Useful for comparison and prediction

Reliable

Repeatable Easy to use Consistent

Definition is the same across different systems

Independent of outside influences


Copyright 2004 David J. Lilja 40

Important Points
Not-so-good metric Better metrics

Copyright 2004 David J. Lilja

Execution time

Clock rate

MFLOPS

MIPS

QUIPS

SPEC

41

Important Points

Speedup = T2 / T1 Relative change = Speedup - 1

Copyright 2004 David J. Lilja

42

You might also like