Metrics of Performance

Performance metrics
What is a performance metric? Characteristics of good metrics Standard processor and system metrics Speedup and relative change
Copyright 2004 David J. Lilja
What is a performance metric?
Count
Of how many times an event occurs Of a time interval Of some parameter
Duration
Size
A value derived from these fundamental measurements

Copyright 2004 David J. Lilja 2
Time-normalized metrics
Rate metrics
Normalize metric to common time basis

Transactions per second Bytes per second
(Number of events) (time interval over which events occurred)
Throughput Useful for comparing measurements over different time intervals

What makes a good metric?

Allows accurate and detailed comparisons Leads to correct conclusions Is well understood by everyone Has a quantitative basis A good metric helps avoid erroneous conclusions
Good metrics are
Linear

Fits with our intuition If metric increases 2x, performance should increase 2x Not an absolute requirement, but very appealing
dB scale to measure sound is nonlinear
Good metrics are
Reliable
If metric A > metric B
Then, Performance A > Performance B
Seems obvious! However,
MIPS(A) > MIPS(B), but Execution time (A) > Execution time (B)
Good metrics are
Repeatable
Same value is measured each time an experiment is performed Must be deterministic Seems obvious, but not always true
E.g. Time-sharing changes measured execution time
Good metrics are
Easy to use

No one will use it if it is hard to measure Hard to measure/derive
less likely to be measured correctly
A wrong value is worse than a bad metric!
Good metrics are
Consistent

Units and definition are constant across systems Seems obvious, but often not true
E.g. MIPS, MFLOPS
Inconsistent impossible to make comparisons
Good metrics are
Independent

A lot of $$$ riding on performance results Pressure on manufacturers to optimize for a particular metric Pressure to influence definition of a metric But a good metric is independent of this pressure
10
Good metrics are

Linear -- nice, but not necessary Reliable -- required Repeatable -- required Easy to use -- nice, but not necessary Consistent -- required Independent -- required
11
Clock rate
Faster clock == higher performance
1 GHz processor always better than 2 GHz
But is it a proportional increase? What about architectural differences?
Actual operations performed per cycle Clocks per instruction (CPI) Penalty on branches due to pipeline depth
What if the processor is not the bottleneck?
Memory and I/O delays

Clock rate
(Faster clock) (better performance) A good first-order metric
Linear Reliable Repeatable Easy to measure Consistent Independent

13
MIPS

Measure of computation speed Millions of instructions executed per second MIPS = n / (Te * 1000000)

n = number of instructions Te = execution time Distance traveled per unit time
Physical analog
14
MIPS
But how much actual computation per instruction?
E.g. CISC vs. RISC Clocks per instruction (CPI)
MIPS = Meaningless Indicator of Performance

15
MFLOPS

Better definition of distance traveled 1 unit of computation (~distance) 1 floatingpoint operation Millions of floating-point ops per second MFLOPS = f / (Te * 1000000)

f = number of floating-point instructions Te = execution time
GFLOPS, TFLOPS,
MFLOPS
Integer program = 0 MFLOPS
But still doing useful work Sorting, searching, etc. E.g. transcendental ops, roots
How to count a FLOP?
Too much flexibility in definition of a FLOP Not consistent across machines

17
SPEC

1. 2. 3.
System Performance Evaluation Coop Computer manufacturers select representative programs for benchmark suite Standardized methodology
Measure execution times Normalize to standard basis machine SPECmark = geometric mean of normalized values
SPEC
Geometric mean is inappropriate (more later) SPEC rating does not correspond to execution times of non-SPEC programs Subject to tinkering
Committee determines which programs should be part of the suite Targeted compiler optimizations
19
QUIPS

Developed as part of HINT benchmark Instead of effort expended, use quality of solution Quality has rigorous mathematical definition QUIPS = Quality Improvements Per Second
20
QUIPS
HINT benchmark program
Find upper and lower rational bounds for 1 (1 x ) 0 (1 x)dx Divide x,y into intervals that are power of 2 Count number of squares completely above/below curve to obtain u and l bounds
Use interval subdivision technique

Quality = 1/(u l)
QUIPS
From: http://hint.byu.edu/documentation/Gus/HINT/ComputerPerformance.html
QUIPS
From: http://hint.byu.edu/documentation/Gus/ HINT/ComputerPerformance.html Copyright 2004 David J. Lilja
23
QUIPS
Primary focus on

Floating-point ops Memory performance
Good for predicting performance on numerical programs But it ignores

24
I/O performance Multiprograming Instruction cache
Quality definition is fixed

Execution time
Ultimately interested in time required to execute your program Smallest total execution time == highest performance Compare times directly Derive appropriate rates Time == fundamental metric of performance
If you cant measure time, you dont know anything

Execution time
Stopwatch measured execution time

Start_count = read_timer(); Portion of program to be measured Stop_count = read_timer(); Elapsed_time = (stop_count start_count) * clock_period;
Measures wall clock time
Includes I/O waits, time-sharing, OS overhead,
CPU time -- include only processor time

Execution time
Best to report both wall clock and CPU times Includes system noise effects

Background OS tasks Virtual to physical page mapping Random cache mapping and replacement Variable system load

27
Report both mean and variance (more later)

Performance metrics summary

Clock Linear Reliable Repeatable Easy to measure Consistent Independent MIPS MFLOPS SPEC QUIPS TIME

28
Other metrics
Response time
Elapsed time from request to response Jobs, operations completed per unit time E.g. video frames per second Bits per second
Throughput

Bandwidth
Ad hoc metrics
Defined for a specific need

Means vs. ends metrics
Means-based metrics

Measure what was done Whether or not it was useful!
Nop instructions, multiply by zero,
Produces unreliable metrics Measures progress towards a goal Only counts what is actually accomplished
Ends-based metrics

Means vs. ends metrics

Means-based Ends-based
Execution time
Clock rate
MFLOPS
MIPS
QUIPS
SPEC
31
Speedup
Speedup of System 1 w.r.t System 2

S2,1 such that: R2 = S2,1 R1 R1 = speed of System 1 R2 = speed of System 2
System 2 is S2,1 times faster than System 1
32
Speedup
Speed is a rate metric
Ri = Di / T i Di ~ distance traveled by System i
Run the same benchmark program on both systems
D1 = D2 = D
Total work done by each system is defined to be execution of the same program
Independent of architectural differences

Speedup
R2 D2 / T2 D / T2 S 2,1 R1 D1 / T1 D / T1 T1 T2
Speedup
S 2,1 1 System 2 is faster tha n System1 S 2,1 1 System 2 is slower tha n System1
35
Relative change
Performance of System 2 relative to System 1 as percent change
R2 R1 2,1 R1
Relative change
R2 R1 D / T2 D / T1 2,1 R1 D / T1 T1 T2 T2 S 2,1 1
Relative change
2,1 0 System 2 is faster tha n System1 2,1 0 System 2 is slower tha n System1
38
Important Points
Metrics can be

Counts Durations Sizes Some combination of the above
39
Important Points
Good metrics are

Linear
More natural
Useful for comparison and prediction
Reliable
Repeatable Easy to use Consistent
Definition is the same across different systems
Independent of outside influences

Important Points
Not-so-good metric Better metrics
Execution time
Clock rate
MFLOPS
MIPS
QUIPS
SPEC
41
Important Points

Speedup = T2 / T1 Relative change = Speedup - 1
42

Metrics of Performance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Metrics of Performance

Uploaded by

Copyright:

Available Formats

Performance metrics

Copyright 2004 David J. Lilja

What is a performance metric?

Of how many times an event occurs Of a time interval Of some parameter

A value derived from these fundamental measurements

Normalize metric to common time basis

Transactions per second Bytes per second

(Number of events) (time interval over which events occurred)

Throughput Useful for comparing measurements over different time intervals

What makes a good metric?

Copyright 2004 David J. Lilja

Good metrics are

dB scale to measure sound is nonlinear

Copyright 2004 David J. Lilja

Good metrics are

If metric A > metric B

Then, Performance A > Performance B

Seems obvious! However,

Copyright 2004 David J. Lilja

Good metrics are

E.g. Time-sharing changes measured execution time

Copyright 2004 David J. Lilja

Good metrics are

No one will use it if it is hard to measure Hard to measure/derive

less likely to be measured correctly

A wrong value is worse than a bad metric!

Copyright 2004 David J. Lilja

Good metrics are

E.g. MIPS, MFLOPS

Inconsistent impossible to make comparisons

Copyright 2004 David J. Lilja

Good metrics are

Copyright 2004 David J. Lilja

Good metrics are

Copyright 2004 David J. Lilja

Faster clock == higher performance

1 GHz processor always better than 2 GHz

But is it a proportional increase? What about architectural differences?

What if the processor is not the bottleneck?

Memory and I/O delays

(Faster clock) (better performance) A good first-order metric

Linear Reliable Repeatable Easy to measure Consistent Independent

Copyright 2004 David J. Lilja

n = number of instructions Te = execution time Distance traveled per unit time

Copyright 2004 David J. Lilja

But how much actual computation per instruction?

Linear Reliable Repeatable Easy to measure Consistent Independent

E.g. CISC vs. RISC Clocks per instruction (CPI)

MIPS = Meaningless Indicator of Performance

Copyright 2004 David J. Lilja

f = number of floating-point instructions Te = execution time

Integer program = 0 MFLOPS

Linear Reliable Repeatable Easy to measure Consistent Independent

How to count a FLOP?

Too much flexibility in definition of a FLOP Not consistent across machines

Linear Reliable Repeatable Easy to measure Consistent Independent

Copyright 2004 David J. Lilja

HINT benchmark program

Use interval subdivision technique

From: http://hint.byu.edu/documentation/Gus/ HINT/ComputerPerformance.html Copyright 2004 David J. Lilja

Linear Reliable Repeatable Easy to measure Consistent Independent

Floating-point ops Memory performance

Good for predicting performance on numerical programs But it ignores