You are on page 1of 29

Advanced Computer Architecture

Unit I Fundamentals Of Computer Design

Introduction
1.1 Introduction 1.2 The Task of a Computer Designer 1.3 Technology and Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design

Whats Computer Architecture?


The attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.

SOFTWARE

Whats Computer Architecture?


1950s to 1960s: Computer Architecture Course Computer Arithmetic. 1970s to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers. (What well do in Chapter 2) 1990s to 2000s: Computer Architecture Course Design of CPU, memory system, I/O system, Multiprocessors. (All evolving at a tremendous rate!)

The Task of a Computer Designer


Implementation Complexity Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends Implement Next Generation System Simulate New Designs and Organizations

Workloads

Technology and Computer Usage Trends


When building a Cathedral numerous very practical considerations need to be taken into account: available materials worker skills willingness of the client to pay the price. Similarly, Computer Architecture is about working within constraints: What will the market buy? Cost/Performance Tradeoffs in materials and processes 6

Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors that could be crammed on a chip doubles every year. This has CONTINUED to be true since then.
Transistors Per Chip
1.E+08 Pentium 3 Pentium Pro 1.E+07 Pentium Pentium II Power PC G3

1.E+06 386 80286 1.E+05

486

Power PC 601

8086 1.E+04

4004 1.E+03 1970 1975 1980 1985 1990 1995 2000 2005

Trends
Processor performance, as measured by the SPEC benchmark has also risen dramatically.

5000 4000 3000 2000 1000 0


Sun MIPS -4/ M 260 2000 IBM RS/ 6000

Alpha 6/833

DEC Alpha 5/500 DEC AXP/ DEC Alpha 21264/600 500 DEC Alpha 4/266

87

88

89

90

91

92

93

94

95

96

97

98

99

2000
8

Trends
Memory Capacity (and Cost) have changed dramatically in the last 20 years.
size

1000000000

100000000

10000000

1000000

100000

10000

1000 1970 1975 1980 1985 Year 1990 1995 2000

year 1980 1983 1986 1989 1992 1996 2000

size(Mb) cyc time 0.0625 250 ns 0.25 220 ns 1 190 ns 4 165 ns 16 145 ns 64 120 ns 256 100 ns

Bits

Trends
Based on SPEED, the CPU has increased dramatically, but memory and disk have increased only a little. This has led to dramatic changed in architecture, Operating Systems, and Programming practices.

Capacity
Logic DRAM 2x in 3 years 4x in 3 years

Speed (latency)
2x in 3 years 2x in 10 years

Disk

4x in 3 years

2x in 10 years

10

Measuring And Reporting Performance


This section talks about: 1. Metrics how do we describe in a numerical way the performance of a computer? 2. What tools do we use to find those metrics?

11

Metrics
Plane DC to Paris Speed Passengers Throughput (pmph) 286,700 Boeing 747 6.5 hours 610 mph 470

BAD/Sud Concodre

3 hours

1350 mph

132

178,200

Time to run the task (ExTime)


Execution time, response time, latency

Tasks per day, hour, week, sec, ns (Performance)


Throughput, bandwidth
12

Metrics - Comparisons
"X is n times faster than Y" means
ExTime(Y) --------ExTime(X) Performance(X) --------------Performance(Y)

Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde
13

Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed: Performance Comparisons
Product Turtle Rabbit Transactions / second 30 60 Seconds/ transaction 0.0333 0.0166 Seconds to process transaction 3 1

Which of the following statements reflect the performance comparison of rabbit and turtle? o Rabbit is 100% faster than turtle. o Rabbit is twice as fast as turtle. o Rabbit takes 1/2 as long as turtle. o Rabbit takes 1/3 as long as turtle. o Rabbit takes 100% less time than turtle. o Rabbit takes 200% less time than turtle. o Turtle is 50% as fast as rabbit. o Turtle is 50% slower than rabbit. o Turtle takes 200% longer than rabbit. o Turtle takes 300% longer than rabbit.

14

Metrics - Throughput
Application Programming Language Compiler
ISA

Answers per month Operations per second

(millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s
Megabytes per second Cycles per second (clock rate)

Datapath Control Function Units Transistors Wires Pins

15

Methods For Predicting Performance


Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental Laws/Principles

16

Benchmarks
SPEC: System Performance Evaluation Cooperative

First Round 1989 10 programs yielding a single number (SPECmarks) Second Round 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,bcopy(a,b,c)= memcpy(b,a,c) wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
Third Round 1995 new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) benchmarks useful for 3 years Single flag setting for all programs: SPECint_base95, SPECfp_base95

17

Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):
Program
164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf C C C C C C C++ C C C C C

Language

What Is It

Compression FPGA Circuit Placement and Routing C Programming Language Compiler Combinatorial Optimization Game Playing: Chess Word Processing Computer Visualization PERL Programming Language Group Theory, Interpreter Object-oriented Database Compression Place and Route Simulator

http://www.spec.org/osg/cpu2000/CINT2000/
18

Benchmarks
CFP2000 (Floating Point Component of SPEC CPU2000):
Program 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi Language Fortran 77 Fortran 77 Fortran 77 Fortran 77 C Fortran 90 C C Fortran 90 C Fortran 90 Fortran 90 Fortran 77 Fortran 77 What Is It Physics / Quantum Chromodynamics Shallow Water Modeling Multi-grid Solver: 3D Potential Field Parabolic / Elliptic Differential Equations 3-D Graphics Library Computational Fluid Dynamics Image Recognition / Neural Networks Seismic Wave Propagation Simulation Image Processing: Face Recognition Computational Chemistry Number Theory / Primality Testing Finite-element Crash Simulation High Energy Physics Accelerator Design Meteorology: Pollutant Distribution

19

Benchmarks
Benchmarks Base Base Base Ref Time Run Time Ratio

Sample Results For SpecINT2000


Peak Peak Peak Ref Time Run Time Ratio

http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

164.gzip 1400 175.vpr 1400 176.gcc 1100 181.mcf 1800 186.crafty 1000 197.parser 1800 252.eon 1300 253.perlbmk 1800 254.gap 1100 255.vortex 1900 256.bzip2 1500 300.twolf 3000 SPECint_base2000 SPECint2000

277 419 275 621 191 500 267 302 249 268 389 784

505* 334* 399* 290* 522* 360* 486* 596* 442* 710* 386* 382* 438

1400 1400 1100 1800 1000 1800 1300 1800 1100 1900 1500 3000

270 417 272 619 191 499 267 302 248 264 375 776

518* 336* 405* 291* 523* 361* 486* 596* 443* 719* 400* 387*
442

Intel OR840(1 GHz Pentium III processor)

20

Benchmarks
Performance Evaluation
For better or worse, benchmarks shape a field Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!
21

Benchmarks
How to Summarize Performance
Management would like to have one number. Technical people want more: 1. They want to have evidence of reproducibility there should be enough information so that you or someone else can repeat the experiment. 2. There should be consistency when doing the measurements multiple times.

How would you report these results?


Computer A Program P1 (secs) Program P2 (secs) 1 1000 Computer B 10 100 Computer C 20 20

Total Time (secs)

1001

110

40

22

Quantitative Principles of Computer Design


Make the common case fast. Amdahls Law:
Relates total speedup of a system to the speedup of some portion of that system.

23

Quantitative Design

Amdahl's Law

Speedup due to enhancement E:


Speedup( E ) Execution _ Time _ Without _ Enhancement Performanc e _ With _ Enhancement Execution _ Time _ With _ Enhancement Performanc e _ Without _ Enhancement

This fraction enhanced

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
24

Quantitative Design

Amdahl's Law
Speedupenhanced
1 = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupoverall =

ExTimeold ExTimenew

This fraction enhanced


ExTimeold ExTimenew 25

Quantitative Design

Amdahl's Law

Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Speedupoverall =

1 0.95

1.053

26

Quantitative Design

Cycles Per Instruction


n

CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count

CPU _ Time Cycle _ Time * CPI i * I i


Instruction Frequency
n
i 1

Number of instructions of type I.

CPI CPI i * Fi
i 1

where

Fi

Ii Instruction _ Count

Invest Resources where time is Spent!


27

Quantitative Design

Cycles Per Instruction

Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.

Base Machine (Reg / Reg) Op Freq Cycles CPI(i) ALU 50% 1 .5 Load 20% 2 .4 Store 10% 2 .2 Branch 20% 2 .4 Total CPI 1.5 How do we get CPI(I)? How do we get %time?

(% Time) (33%) (27%) (13%) (27%)

28

Quantitative Design

Locality of Reference

Programs access a relatively small portion of the address space at any instant of time.
There are two different types of locality: Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.) Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close by tend to be referenced soon (straight line code, array access, etc.)

29

You might also like