You are on page 1of 37

Computer Architecture I

Lecture Notes

Dr. Ali Muhtaroğlu


Fall 2009
METU Northern Cyprus Campus

References:
Patterson&Hennessy, “Computer Organization and Design” (4th Ed.), Kaufmann, 2008.
Stallings, “Computer Organization & Architecture” (7th Ed.), Pearson, 2006.
Mano & Kime, “Logic and Computer Design Fundamentals”, 4th Ed., Prentice Hall, 2008.
Brown & Vranesic, “Fund. Of Dig. Logic with VHDL Design” (2nd Ed.), McGraw Hill, 2005.
Dr. Patterson’s and Dr. Mary Jane Irwin’s (Penn State) Lecture notes
Introduction

Lecture 1

Ali Muhtaroğlu 2
What computers were…

EDSAC, University of Cambridge, UK, 1949


Ali Muhtaroğlu 3
What computers are…
Sensor Nets
Cameras
Games
Media
Players Laptops

Servers

Robots
Smart Routers
phones

Automobiles
Supercomputers
Ali Muhtaroğlu 4 4
CS152-Spring’08
What is Computer Architecture ?

Application

Gap too large to bridge


in one step
(but there are exceptions, e.g.
magnetic compass)

Physics

In its broadest definition, computer architecture is the design of


the abstraction layers that allow us to implement information
processing applications efficiently using available manufacturing
technologies.
Ali Muhtaroğlu 5
Abstraction Layers
in Modern Computing Systems
Application
Algorithm
Programming Language
Original Operating System/Virtual Machines
domain of the Domain of
Instruction Set Architecture (ISA)
computer recent
architect Microarchitecture computer
(‘50s-’80s) architecture
Gates/Register-Transfer Level (RTL)
(‘90s)
Circuits
Devices
Physics

Ali Muhtaroğlu 6
Computer Architecture
vs. Computer Organization
• Architecture is those attributes visible to the
programmer
– Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
– e.g. Is there a multiply instruction?

• Organization is how features are implemented


– Control signals, interfaces, memory technology.
– e.g. Is there a hardware multiply unit or is it done by
repeated addition?

Caution: You may hear these used interchangeably in the


industry. 7
Ali Muhtaroğlu
Computer Architecture
vs. Computer Organization
• All Intel x86 family share the same basic architecture

• The IBM System/370 family share the same basic architecture

• This provides backwards code compatibility


– Not necessarily forward compatibility

• Organization differs between different versions of computers


• Hardware gets cheaper and more compact  Do more in
hardware
• Performance, power dissipation, size factors also drive
changes in organization

Ali Muhtaroğlu 8
Historical: ENIAC - background
• Electronic Numerical • Decimal (not binary)
Integrator And Computer • 20 accumulators of 10 digits
• Eckert and Mauchly • Programmed manually by
• University of Pennsylvania switches
• Trajectory tables for weapons • 18,000 vacuum tubes
• Started 1943 • 30 tons
• Finished 1946 • 15,000 square feet
– Too late for war effort • 140 kW power consumption
• Used until 1955 • 5,000 additions per second

Ali Muhtaroğlu 9
von Neumann/Turing
• Stored Program concept
• Main memory storing
programs and data
• ALU operating on binary data
• Control unit interpreting
instructions from memory and
executing
• Input and output equipment
operated by control unit
• Princeton Institute for
Advanced Studies
– IAS
• Completed 1952
Ali Muhtaroğlu 10
IAS - details
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions

• Set of registers (storage in CPU)


– Memory Buffer Register
– Memory Address Register
– Instruction Register
– Instruction Buffer Register
– Program Counter
– Accumulator
– Multiplier Quotient

Ali Muhtaroğlu 11
IAS - details
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions

• Set of registers (storage in CPU)


– Memory Buffer Register
– Memory Address Register
– Instruction Register
– Instruction Buffer Register
– Program Counter
– Accumulator
– Multiplier Quotient

Ali Muhtaroğlu 12
Components of a Computer
• 5 classic components of a computer
• Input, Output, Memory, Datapath, Control
• Independent of hardware technology
• Represents the past and the present

Ali Muhtaroğlu 13
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
– Up to 100 devices on a chip
• Medium scale integration - to 1971
– 100-3,000 devices on a chip
• Large scale integration - 1971-1977
– 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
– 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
– Over 100,000,000 devices on a chip

Ali Muhtaroğlu 14
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged

For the same structure/function:


• Higher packing density means shorter electrical paths, giving
higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Ali Muhtaroğlu 15
Growth in CPU Transistor Count

Ali Muhtaroğlu 16
Speeding it up

• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis
• Speculative execution

Ali Muhtaroğlu 17
Performance Balance

• Processor speed increased


• Memory capacity increased
• Memory speed lags behind processor
speed

Ali Muhtaroğlu 18
Logic and Memory Performance Gap

Ali Muhtaroğlu 19
Solutions to Logic/Memory Gap
• Increase number of bits retrieved at one time
– Make DRAM “wider” rather than “deeper”

• Change DRAM interface


– Cache

• Reduce frequency of memory access


– More complex cache and cache on chip

• Increase interconnection bandwidth


– High speed buses
– Hierarchy of buses
Ali Muhtaroğlu 20
Typical I/O Device Data Rates

Ali Muhtaroğlu 21
I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this
• Problem moving data
• Solutions:
– Caching
– Buffering
– Higher-speed interconnection buses
– More elaborate bus structures
– Multiple-processor configurations

Ali Muhtaroğlu 22
Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
– Fundamentally due to shrinking logic gate size
• More gates, packed more tightly, increasing clock rate
• Propagation time for signals reduced

• Increase size and speed of caches


– Dedicating part of processor chip
• Cache access times drop significantly

• Change processor organization and architecture


– Increase effective speed of execution
– Parallelism
Ali Muhtaroğlu 23
Problems with Clock Speed and Logic Density
• Power
– Power density increases with density of logic and clock speed
– Dissipating heat
• RC delay
– Speed at which electrons flow limited by resistance and capacitance of
metal wires connecting them
– Delay increases as RC product increases
– Wire interconnects thinner, increasing resistance
– Wires closer together, increasing capacitance
• Memory latency
– Memory speeds lag processor speeds
• Solution:
– More emphasis on organizational and architectural approaches

Ali Muhtaroğlu 24
Intel Microprocessor Performance

Ali Muhtaroğlu 25
Increased Cache Capacity

• Typically two or three levels of cache between


processor and main memory
• Chip density increased
– More cache memory on chip
• Faster cache access
• Pentium chip devoted about 10% of chip area to
cache
• Pentium 4 devotes about 50%

Ali Muhtaroğlu 26
More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line
– Different stages of execution of different instructions at
same time along pipeline
• Superscalar allows multiple pipelines within single
processor
– Instructions that do not depend on one another can be
executed in parallel

Ali Muhtaroğlu 27
Diminishing Returns
• Internal organization of processors complex
– Can get a great deal of parallelism
– Further significant increases likely to be relatively modest

• Benefits from cache are reaching limit

• Increasing clock rate runs into power dissipation


problem
– Some fundamental physical limits are being reached

Ali Muhtaroğlu 28
Uniprocessor Performance

10000
From Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th
edition, October, 2006 ??%/year
1000
Performance (vs. VAX-11/780)

52%/year

100

10
25%/year

1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

• VAX : 25%/year 1978 to 1986


• RISC + x86: 52%/year 1986 to 2002
• RISC + x86: ??%/year 2002 to present
Ali Muhtaroğlu 29
Uniprocessor Performance
The End of Uniprocessor Era for High Performance
Computing !
10000
From Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th
edition, October, 2006 ??%/year
1000
Performance (vs. VAX-11/780)

52%/year

100

10
25%/year

1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

• VAX : 25%/year 1978 to 1986


• RISC + x86: 52%/year 1986 to 2002
• RISC + x86: ??%/year 2002 to present
Ali Muhtaroğlu 30
Déjà vu all over again?
• Multiprocessors imminent in 1970s, ‘80s, ‘90s, …
• “… today’s processors … are nearing an impasse as technologies
approach the speed of light..”
David Mitchell, The Transputer: The Time Is Now (1989)
• Transputer was premature
⇒ Custom multiprocessors tried to beat uniprocessors
⇒ Procrastination rewarded: 2X seq. perf. / 1.5 years
• “We are dedicating all of our future product development to multicore
designs. … This is a sea change in computing”
Paul Otellini, President, Intel (2004)
• Difference is all microprocessor companies have switched to
multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2+ CPUs)
⇒ Procrastination penalized: 2X sequential perf. / 5 yrs
⇒ Biggest programming challenge: from 1 to 2 CPUs

Ali Muhtaroğlu 31
Multiple Cores
• Multiple processors on single chip
– Large shared cache
• Within a processor, increase in performance proportional to
square root of increase in complexity
• If software can use multiple processors, doubling number of
processors almost doubles performance
• So, use two simpler processors on the chip rather than one
more complex processor
• With two processors, larger caches are justified
– Power consumption of memory logic less than processing logic
• Example: IBM POWER4
– Two cores based on PowerPC

Ali Muhtaroğlu 32
Some physics…
• Dynamic power dissipation is CV2f
Where C is the effective switching capacitance while running an
application
f is the switching frequency
V is the operating voltage
• Also f is roughly proportional to V

Some math…
• Assume a single core processor C = 10 nF, V=1.2 V, f = 2 GHz
Power = CV2f = 28.8 W
• What if we get to 2.4 GHz by increasing V to 1.3 V.
Power = 40.6 W
• What if we have 2 core processor operating at 1.6 GHz and 1.1 V
Power = 38.7 W
• What if we now build the original chip to be less complex i.e. smaller C ?

Ali Muhtaroğlu 33
POWER4 Chip Organization

Ali Muhtaroğlu 34
Problems with “sea change”?
• Algorithms, Programming Languages, Compilers,
Operating Systems, Architectures, Libraries, … not
ready to supply Thread-Level Parallelism or Data-
Level Parallelism for 1000 CPUs / chip,

• Architectures not ready for 1000 CPUs / chip


– Unlike Instruction-Level Parallelism, cannot be solved by
computer architects and compiler writers alone, but also
cannot be solved without participation of architects

• Need a reworking of all the abstraction layers in the


computing system stack
Ali Muhtaroğlu 35
Abstraction Layers
in Modern Computing Systems
Application
Algorithm
Programming Language Parallel
computing,
Original Operating System/Virtual Machines security, …
domain of Domain of
Instruction Set Architecture (ISA)
the computer recent
architect Microarchitecture computer
(‘50s-’80s) architecture
Gates/Register-Transfer Level (RTL)
(‘90s)
Circuits Reliability,
Devices power, …

Physics
Reinvigoration of
computer architecture,
mid-2000s onward.

Ali Muhtaroğlu 36
EEE-445
Our goal will be to acquire a good understanding of:
- Computer System components
- Instruction Set Architecture (ISA) design
- Single-Cycle and Multi-Cycle hardware organization/design to
support an ISA through
- We will do limited exercises on defining hardware through
- Software emulators
- VHDL/schematic capture description and simulation

EEE-446 (next term): will complete the missing theoretical pieces to


obtain a solid Computer Architecture background:
- Bit slicing
- Arithmetic algorithms and how they are implemented in hardware
- More advanced Memory and I/O concepts
- Pipelining and parallel processing concepts
In addition you will get some practical experience in the lab by applying
what you learnt in EEE-445/446 sequence to design your own CPU.
Ali Muhtaroğlu 37

You might also like