You are on page 1of 36

Computer Architecture

Chapter 1
Fundamentals

Prof. Jerry Breecher


CSCI 240
Fall 2003

Chapter 1 - Fundamentals 1
Introduction

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy

Chapter 1 - Fundamentals 2
Art and
Architecture

What’s the difference


between Art and
Architecture?

Lyonel Feininger,
Marktkirche in Halle

Chapter 1 - Fundamentals 3
Art and Architecture

Notre Dame
de Paris

What’s the difference between Art and Architecture?


Chapter 1 - Fundamentals 4
What’s Computer Architecture?
The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data
flows and controls the logic design, and the physical
implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARE

Chapter 1 - Fundamentals 5
What’s Computer Architecture?
• 1950s to 1960s: Computer Architecture Course
Computer Arithmetic.
• 1970s to mid 1980s: Computer Architecture Course
Instruction Set Design, especially ISA appropriate for
compilers. (What we’ll do in Chapter 2)
• 1990s to 2000s: Computer Architecture Course
Design of CPU, memory system, I/O system,
Multiprocessors. (All evolving at a tremendous rate!)

Chapter 1 - Fundamentals 6
The Task of a
Computer Designer
1.1 Introduction
1.2 The Task of a Computer
Designer
1.3 Technology and Computer
Usage Trends Evaluate Existing
1.4 Cost and Trends in Cost Implementation Systems for
1.5 Measuring and Reporting Complexity Bottlenecks
Performance
1.6 Quantitative Principles of
Computer Design Benchmarks
1.7 Putting It All Together: The
Concept of Memory Technology
Hierarchy Trends

Implement Next
Simulate New
Generation System
Designs and
Organizations

Workloads

Chapter 1 - Fundamentals 7
Technology and
Computer Usage Trends
1.1 Introduction
1.2 The Task of a Computer Designer When building a Cathedral numerous
1.3 Technology and Computer Usage very practical considerations need to
Trends
be taken into account:
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
• available materials
1.6 Quantitative Principles of Computer • worker skills
Design
• willingness of the client to pay the
1.7 Putting It All Together: The Concept of
Memory Hierarchy
price.

Similarly, Computer Architecture is about


working within constraints:
• What will the market buy?
• Cost/Performance
• Tradeoffs in materials and processes

Chapter 1 - Fundamentals 8
Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of
transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.
Transistors Per Chip

1.E+08

Pentium 3

Pentium Pro
1.E+07 Pentium II
Pentium Power PC G3

486 Power PC 601


1.E+06
386

80286
1.E+05

8086

1.E+04

4004
1.E+03
1970 1975 1980 1985 1990 1995 2000 2005

Chapter 1 - Fundamentals 9
Trends
Processor performance, as measured by the SPEC benchmark has
also risen dramatically.

5000
Alpha 6/833

4000

3000

2000
DEC Alpha 5/500
DEC
1000 Sun MIPS
IBM AXP/
RS/ 500 DEC Alpha 4/266 DEC Alpha 21264/600
-4/ M
6000
0 260 2000

2000
87
88
89
90
91
92
93
94
95
96
97
98
99
Chapter 1 - Fundamentals 10
Trends
Memory Capacity (and Cost) have changed dramatically in the last
20 years.

size

1000000000

100000000 year size(Mb) cyc time


10000000
1980 0.0625 250 ns
1983 0.25 220 ns
1000000
Bits

1986 1 190 ns
100000 1989 4 165 ns
1992 16 145 ns
10000
1996 64 120 ns
1000 2000 256 100 ns
1970 1975 1980 1985 1990 1995 2000

Year

Chapter 1 - Fundamentals 11
Trends
Based on SPEED, the CPU has increased dramatically, but memory
and disk have increased only a little. This has led to dramatic
changed in architecture, Operating Systems, and Programming
practices.

Capacity Speed (latency)


Logic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years

Chapter 1 - Fundamentals 12
Measuring And
Reporting Performance
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance This section talks about:
1.6 Quantitative Principles of Computer
Design
1. Metrics – how do we describe
1.7 Putting It All Together: The Concept of
Memory Hierarchy in a numerical way the
performance of a computer?

2. What tools do we use to find


those metrics?

Chapter 1 - Fundamentals 13
Metrics
Throughput
Plane DC to Paris Speed Passengers
(pmph)

Boeing 747 6.5 hours 610 mph 470 286,700

BAD/Sud
3 hours 1350 mph 132 178,200
Concodre

• Time to run the task (ExTime)


– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns …
(Performance)
– Throughput, bandwidth
Chapter 1 - Fundamentals 14
Metrics - Comparisons
"X is n times faster than Y" means

ExTime(Y) Performance(X)
--------- = ---------------
ExTime(X) Performance(Y)

Speed of Concorde vs. Boeing 747

Throughput of Boeing 747 vs. Concorde

Chapter 1 - Fundamentals 15
Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine
performance. There is special interest in comparing the new product, rabbit to the
old product, turtle, since the product was rewritten for performance reasons. (Pat
had used Performance Engineering techniques and thus knew that rabbit was
"about twice as fast" as turtle.) The measurements showed:
 
Performance Comparisons
 
Product Transactions / second Seconds/ transaction Seconds to process transaction
Turtle 30 0.0333 3
Rabbit 60 0.0166 1

Which of the following statements reflect the performance comparison of rabbit and
turtle?
 
o Rabbit is 100% faster than turtle. o Rabbit takes 200% less time than turtle.
o Rabbit is twice as fast as turtle. o Turtle is 50% as fast as rabbit.
o Rabbit takes 1/2 as long as turtle. o Turtle is 50% slower than rabbit.
o Rabbit takes 1/3 as long as turtle. o Turtle takes 200% longer than rabbit.
o Rabbit takes 100% less time than turtle. o Turtle takes 300% longer than rabbit.

Chapter 1 - Fundamentals 16
Metrics - Throughput
Application Answers per month
Operations per second
Programming
Language
Compiler
(millions) of Instructions per second: MIPS
ISA (millions) of (FP) operations per second:
MFLOP/s
Datapath
Control Megabytes per second
Function Units
Transistors Wires Pins Cycles per second (clock rate)

Chapter 1 - Fundamentals 17
Methods For Predicting
Performance
• Benchmarks, Traces, Mixes
• Hardware: Cost, delay, area, power estimation
• Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental “Laws”/Principles

Chapter 1 - Fundamentals 18
Benchmarks
SPEC: System Performance Evaluation
Cooperative
• First Round 1989
– 10 programs yielding a single number (“SPECmarks”)
• Second Round 1992
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=
memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
• Third Round 1995
– new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating
point)
– “benchmarks useful for 3 years”
– Single flag setting for all programs: SPECint_base95, SPECfp_base95

Chapter 1 - Fundamentals 19
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):

Program Language What Is It


164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing: Chess
197.parser C Word Processing
252.eon C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2C Compression
300.twolf C Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
Chapter 1 - Fundamentals 20
Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000):
Program Language What Is It
168.wupwise Fortran 77 Physics / Quantum Chromodynamics
171.swim Fortran 77 Shallow Water Modeling
172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field
173.applu Fortran 77 Parabolic / Elliptic Differential Equations
177.mesaC 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid Dynamics
179.art C Image Recognition / Neural Networks
183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing: Face Recognition
188.ammp C Computational Chemistry
189.lucasFortran 90 Number Theory / Primality Testing
191.fma3d Fortran 90 Finite-element Crash Simulation
200.sixtrack Fortran 77 High Energy Physics Accelerator Design
301.apsiFortran 77 Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/
Chapter 1 - Fundamentals 21
Benchmarks Sample Results For
SpecINT2000
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

Base Base Base Peak Peak Peak


Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
164.gzip 1400 277 505* 1400 270 518*
175.vpr 1400 419 334* 1400 417 336*
176.gcc 1100 275 399* 1100 272 405*
181.mcf 1800 621 290* 1800 619 291* Intel OR840(1 GHz
Pentium III processor)
186.crafty 1000 191 522* 1000 191 523*
197.parser 1800 500 360* 1800 499 361*
252.eon 1300 267 486* 1300 267 486*
253.perlbmk 1800 302 596* 1800 302 596*
254.gap 1100 249 442* 1100 248 443*
255.vortex 1900 268 710* 1900 264 719*
256.bzip2 1500 389 386* 1500 375 400*
300.twolf 3000 784 382* 3000 776 387*
SPECint_base2000 438
SPECint2000 442
Chapter 1 - Fundamentals 22
Benchmarks
Performance Evaluation
• “For better or worse, benchmarks shape a field”
• Good products created when have:
– Good benchmarks
– Good ways to summarize performance
• Given sales is a function in part of performance relative to
competition, investment in improving product as reported by
performance summary
• If benchmarks/summary inadequate, then choose between
improving product for real programs vs. improving product to get
more sales;
Sales almost always wins!
• Execution time is the measure of computer performance!

Chapter 1 - Fundamentals 23
Benchmarks
How to Summarize Performance
Management would like to have one number.
Technical people want more:
1. They want to have evidence of reproducibility – there should be enough
information so that you or someone else can repeat the experiment.
2. There should be consistency when doing the measurements multiple
times.

How would you report these results?


Computer A Computer B Computer C

Program P1 (secs) 1 10 20

Program P2 (secs) 1000 100 20

Total Time (secs) 1001 110 40

Chapter 1 - Fundamentals 24
Quantitative Principles
of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
Make the common case fast.
1.5 Measuring and Reporting Performance Amdahl’s Law:
1.6 Quantitative Principles of Computer Relates total speedup of a
Design
system to the speedup of some
1.7 Putting It All Together: The Concept of
Memory Hierarchy portion of that system.

Chapter 1 - Fundamentals 25
Quantitative Amdahl's Law
Design

Speedup due to enhancement E:

Execution _ Time _ Without _ Enhancement Performance _ With _ Enhancement


Speedup( E )  
Execution _ Time _ With _ Enhancement Performance _ Without _ Enhancement

This fraction enhanced


Suppose that enhancement E accelerates a fraction F
of the task by a factor S, and the remainder of the
task is unaffected
Chapter 1 - Fundamentals 26
Quantitative
Amdahl's Law
Design
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

1
ExTimeold
Speedupoverall = =
(1 - Fractionenhanced) + Fractionenhanced
ExTimenew
Speedupenhanced

This fraction enhanced


ExTimeold ExTimenew

Chapter 1 - Fundamentals 27
Quantitative Amdahl's Law
Design
• Floating point instructions improved to run 2X; but only
10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Speedupoverall = 1 = 1.053
0.95

Chapter 1 - Fundamentals 28
Quantitative Cycles Per
Design Instruction
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count

n
CPU _ Time  Cycle _ Time *  CPI i * I i
i 1
“Instruction Frequency” Number of
instructions of
type I.

n
CPI   CPI i * Fi where Fi  Ii
Instruction _ Count
i 1

Invest Resources where time is Spent!

Chapter 1 - Fundamentals 29
Quantitative Cycles Per
Design Instruction
Suppose we have a machine where we can count the frequency with which
instructions are executed. We also know how many cycles it takes for
each instruction type.

Base Machine (Reg / Reg)


Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
Total CPI 1.5

How do we get CPI(I)?


How do we get %time?
Chapter 1 - Fundamentals 30
Quantitative Locality of
Design Reference
Programs access a relatively small portion of the address space at
any instant of time.

There are two different types of locality:

Temporal Locality (locality in time): If an item is referenced, it will


tend to be referenced again soon (loops, reuse, etc.)

Spatial Locality (locality in space/location): If an item is referenced,


items whose addresses are close by tend to be referenced soon
(straight line code, array access, etc.)

Chapter 1 - Fundamentals 31
The Concept of
Memory Hierarchy
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost Fast memory is expensive.
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design Slow memory is cheap.
1.7 Putting It All Together: The Concept of
Memory Hierarchy
The goal is to minimize the
price/performance for a
particular price point.

Chapter 1 - Fundamentals 32
Memory Hierarchy

Level 1 Level 2
Registers Memory Disk
cache Cache

Typical 4 - 64 <16K bytes <2 Mbytes <16 >5


Size Gigabytes Gigabytes
Access 1 nsec 3 nsec 15 nsec 150 nsec 5,000,000
Time nsec
Bandwidth 10,000 – 2000 - 5000 500 - 1000 500 - 1000 100
(in MB/sec) 50,000
Managed Compiler Hardware Hardware OS OS/User
By

Chapter 1 - Fundamentals 33
Memory Hierarchy
• Hit: data appears in some block in the upper level (example:
Block X)
– Hit Rate: the fraction of memory access found in the upper level
– Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the lower level
(Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
• Hit Time << Miss Penalty (500 instructions on 21264!)

Chapter 1 - Fundamentals 34
Memory Hierarchy

Level 1 Level 2
Registers Memory Disk
cache Cache

What is the cost of executing a program if:


• Stores are free (there’s a write pipe)
• Loads are 20% of all instructions
• 80% of loads hit (are found) in the Level 1 cache
• 97 of loads hit in the Level 2 cache.

Chapter 1 - Fundamentals 35
Wrap Up

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy

Chapter 1 - Fundamentals 36

You might also like