Professional Documents
Culture Documents
Processor technology
The architecture of the computation engine used to implement a
systems desired functionality
Processor does not have to be programmable
Datapath
Controller
Datapath
Controller
Datapath
Control
logic and
State register
Control logic
and State
register
Registers
Control
logic
index
Register
file
IR
PC
General
ALU
IR
Custom
ALU
Data
memory
total = 0
for i =1 to
General-purpose (software)
2004 (jinsoo@cs.kaist.ac.kr)
PC
Data
memory
Program
memory
Assembly code
for:
State
register
total
Data
memory
Program memory
Assembly code
for:
total = 0
for i =1 to
Application-specific
Single-purpose (hardware)
Processor technology
Desired
functionality
General-purpose
processor
total = 0
for i = 1 to N loop
total += M[i]
end loop
Application-specific
processor
2004 (jinsoo@cs.kaist.ac.kr)
Single-purpose
processor
General-purpose processors
Programmable device used in a variety of
applications
Features
Program memory
General datapath with large register file and
general ALU
User benefits
Low time-to-market and NRE costs
High flexibility
Controller
Datapath
Control
logic and
State register
Register
file
IR
PC
Program
memory
General
ALU
Data
memory
Assembly code
for:
total = 0
for i =1 to
2004 (jinsoo@cs.kaist.ac.kr)
Single-purpose processors
Digital circuit designed to execute exactly one
program
Features
Contains only the components needed to
execute a single program
No program memory
Controller
Datapath
Control
logic
index
total
State
register
Benefits
Fast
Low power
Small size
2004 (jinsoo@cs.kaist.ac.kr)
Data
memory
Application-specific processors
Programmable processor optimized for a
particular class of applications having
common characteristics
Controller
Datapath
Control
logic and
State register
Registers
Features
Program memory
Optimized datapath
Special functional units
Benefits
Some flexibility, good performance, size and
power
Custom
ALU
IR
PC
Program
memory
Data
memory
Assembly code
for:
total = 0
for i =1 to
2004 (jinsoo@cs.kaist.ac.kr)
Processor Technology
General Purpose (software)
Application Specific
Single Purpose (Hardware)
IC technology
Full Custom/VLSI
Semi-custom ASIC (gate-array, standard cell)
PLD
2004 (jinsoo@cs.kaist.ac.kr)
Custom single-purpose
processors: Hardware
Introduction
Processor
CCD
preprocessor
D2A
lens
JPEG codec
Microcontroller
Multiplier/Accum
DMA controller
Memory controller
2004 (jinsoo@cs.kaist.ac.kr)
Pixel coprocessor
Display
ctrl
UART
LCD ctrl
external
control
inputs
external
data
inputs
controller
datapath
control
inputs
datapath
control
outputs
external
control
outputs
datapath
controller
datapath
next-state
and
control
logic
registers
state
register
functional
units
external
data
outputs
2004 (jinsoo@cs.kaist.ac.kr)
10
2004 (jinsoo@cs.kaist.ac.kr)
!1
(a) black-box
view
1:
1
!(!go_i)
(c) state
diagram
2:
go_i
x_i
y_i
!go_i
2-J:
GCD
3:
x = x_i
4:
y = y_i
d_o
!(x!=y)
5:
x!=y
6:
x<y
7:
y = y -x
!(x<y)
8: x = x - y
6-J:
5-J:
9:
d_o = x
1-J:
11
Loop statement
a=b
next statement
a=b
Branch statement
while (cond) {
loop-bodystatements
}
next statement
!cond
C:
if (c1)
c1 stmts
else if c2
c2 stmts
else
other stmts
next statement
C:
c1
cond
loop-bodystatements
next
statement
c2 stmts
!c1*!c2
others
J:
J:
next
statement
2004 (jinsoo@cs.kaist.ac.kr)
c1 stmts
!c1*c2
next
statement
12
!1
1:
declared variable
Create a functional unit for
each arithmetic operation
Connect the ports, registers
and functional units
!(!go_i)
2:
x_i
!go_i
y_i
Datapath
2-J:
x_sel
3:
x = x_i
4:
y = y_i
x_ld
6:
x<y
y = y -x
0: x
0: y
!(x!=y)
x!=y
7:
n-bit 2x1
y_ld
5:
n-bit 2x1
y_sel
!(x<y)
8: x = x - y
!=
5: x!=y
x_neq_y
x_lt_y
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
6-J:
5-J:
9:
d_o = x
1-J:
2004 (jinsoo@cs.kaist.ac.kr)
13
!1
1:
Controller
1
!(!go_i)
0000
1:
0001
2:
!1
1
2:
!go_i
!(!go_i)
!go_i
2-J:
0010 2-J:
3:
x = x_i
4:
y = y_i
0011
x_sel = 0
3: x_ld = 1
0100
y_sel = 0
4: y_ld = 1
!(x!=y)
5:
0101
0110
x<y
7:
y = y -x
!(x<y)
8: x = x - y
5:
Datapath
x_sel
x_neq_y
6:
!x_lt_y
x_sel
=1
8:
x_ld = 1
9:
1-J:
d_o = x
!=
5: x!=y
x_neq_y
x_lt_y
1011
9:
d_ld = 1
1100 1-J:
2004 (jinsoo@cs.kaist.ac.kr)
n-bit 2x1
0: x
0: y
y_ld
1000
1010 5-J:
n-bit 2x1
y_sel
x_ld
x_lt_y
7: y_sel = 1
y_ld = 1
1001 6-J:
5-J:
y_i
!x_neq_y
0111
6-J:
actions/conditions with
datapath configurations
x_i
x!=y
6:
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
14
Controller
0000
go_i
!1
x_i
1:
1
x_sel
Combinational
logic
y_sel
0001
x_neq_y
!(!go_i)
x_sel
!go_i
0010 2-J:
0011
x_lt_y
d_ld
0100
x_ld
x_sel = 0
3: x_ld = 1
5:
0110
6:
!=
x_neq_y=0
x_neq_y=1
State register
I2
I1
I0
n-bit 2x1
0: x
0: y
y_ld
y_sel = 0
4: y_ld = 1
0101
n-bit 2x1
y_sel
Q3 Q2 Q1 Q0
I3
(b) Datapath
2:
x_ld
y_ld
y_i
x_lt_y=1
7: y_sel = 1
y_ld = 1
x_lt_y=0
x_sel
=1
8:
x_ld = 1
0111
5: x!=y
x_neq_y
x_lt_y
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
1000
1001 6-J:
1010 5-J:
1011
9:
d_ld = 1
1100 1-J:
2004 (jinsoo@cs.kaist.ac.kr)
15
Q2
Q1
Q0
Outputs
x_lt_
y
*
go_i
I3
I2
I1
I0
x_sel
y_sel
x_ld
y_ld
d_ld
x_neq
_y
*
2004 (jinsoo@cs.kaist.ac.kr)
16
controller
datapath
next-state
and
control
logic
registers
state
register
functional
units
2004 (jinsoo@cs.kaist.ac.kr)
17
Summary
Custom single-purpose processors
2004 (jinsoo@cs.kaist.ac.kr)
18
General-Purpose
Processors: Software
Introduction
General-Purpose Processor
Processor designed for a variety of computation tasks
Low unit cost, in part because manufacturer spreads NRE over
large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996
alone
2004 (jinsoo@cs.kaist.ac.kr)
20
2004 (jinsoo@cs.kaist.ac.kr)
21
2004 (jinsoo@cs.kaist.ac.kr)
22
Power
Custom logic is a clear winner for low power devices.
Modern microprocessors offer features to help control power
consumption.
Software design techniques can help reduce power
consumption.
2004 (jinsoo@cs.kaist.ac.kr)
23
Basic Architecture
Basic Architecture
Control unit and
Processor
datapath
Control unit
Note similarity to
single-purpose
processor
Datapath
ALU
Controller
Control
/Status
Key differences
Datapath is general
Control unit doesnt
store the algorithm
the algorithm is
programmed into the
memory
Registers
PC
IR
I/O
Memory
2004 (jinsoo@cs.kaist.ac.kr)
25
2004 (jinsoo@cs.kaist.ac.kr)
26
Throughput
Wash
Non-pipelined
Dry
Decode
Time
Instruction 1
2004 (jinsoo@cs.kaist.ac.kr)
Execute
Store res.
Fetch ops.
Pipelined
Fetch-instr.
Time
Pipelined
Time
27
Processor
Princeton
Processor
Fewer memory
wires
Harvard
Simultaneous
program and data
memory access
Program
memory
Data memory
Harvard
2004 (jinsoo@cs.kaist.ac.kr)
Memory
(program and data)
Princeton
28
2004 (jinsoo@cs.kaist.ac.kr)
29
Cache Memory
Memory access may be slow
Cache is small but fast memory
close to processor
Cache
Memory
2004 (jinsoo@cs.kaist.ac.kr)
30
Application-Specific
Instruction-Set Processors
(ASIPs)
Application-Specific Instruction-Set
Processors (ASIPs)
General-purpose processors
Sometimes too general to be effective in demanding application
e.g., video processing requires huge video buffers and
operations on large arrays of data, inefficient on a GPP
But single-purpose processor has high NRE, not programmable
2004 (jinsoo@cs.kaist.ac.kr)
32
Microprocessor varieties
Microcontroller: includes I/O devices, on-board memory.
Digital signal processor (DSP): microprocessor optimized for
digital signal processing.
Typical embedded word sizes: 8-bit, 16-bit, 32-bit.
2004 (jinsoo@cs.kaist.ac.kr)
33
Embedded Processors
2004 (jinsoo@cs.kaist.ac.kr)
34
Past
Microprocessor
Microcontroller
DSP
Graphics
Processor
2004 (jinsoo@cs.kaist.ac.kr)
Now / Future
Network
Processor
Sensor Processor
Cryptoprocessor
Game Processor
Wearable Processor
Mobile Processor
35
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register space
On-chip program and data memory
Direct programmer access to many of the chips pins
Specialized instructions for bit-manipulation and other low-level
operations
2004 (jinsoo@cs.kaist.ac.kr)
36
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.
2004 (jinsoo@cs.kaist.ac.kr)
37
2004 (jinsoo@cs.kaist.ac.kr)
38
Reconfigurable SoC
Other Examples
Atmels FPSLIC
(AVR + FPGA)
Alteras Nios
(configurable
RISC on a PLD)
Triscends A7 CSoC
2004 (jinsoo@cs.kaist.ac.kr)
39
Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.
2004 (jinsoo@cs.kaist.ac.kr)
40
Processors
Processor
Clock speed
Intel PIII
1GHz
IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110
550 MHz
Intel
8051
Motorola
68HC811
12 MHz
250 MHz
233 MHz
3 MHz
TI C5416
160 MHz
Lucent
DSP32C
80 MHz
Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None
Bus Width
MIPS
General Purpose Processors
32
~900
Power
Trans.
Price
97W
~7M
$900
32/64
~1300
5W
~7M
$900
32/64
NA
NA
3.6M
NA
32
268
1W
2.1M
NA
Microcontroller
~1
~0.2W
~10K
$7
~.5
~0.1W
~10K
$5
NA
NA
$34
32
NA
NA
$75
40
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
2004 (jinsoo@cs.kaist.ac.kr)
41
Summary
General-purpose processors
Good performance, low NRE, flexible
ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs
2004 (jinsoo@cs.kaist.ac.kr)
42
Instruction Sets
2004 (jinsoo@cs.kaist.ac.kr)
44
CISC
Intel
1971
4004
2,250
, Busicom
1972
8008
2,500
Mark-8 ,
1974
8080
5,000
Altair
1978
8086/8088
1982
29,000
IBM-PC XT ,
80286
120,000
IBM-PC AT , 6 5
1985
80386
275,000
32
1989
80486
1,180,000
1993
Pentium
3,100,000
1995
Pentium Pro
5,500,000
Dynamic Execution
1997
Pentium 2
7,500,000
MMX
1999
Pentium 3
24,000,000
SIMD , 12
2001
Itanium
25,000,000
2002
Pentium 4
55,000,000
20 ,
2003
Itanium 2
410,000,000
2004 (jinsoo@cs.kaist.ac.kr)
45
2004 (jinsoo@cs.kaist.ac.kr)
46
CISC - History
2004 (jinsoo@cs.kaist.ac.kr)
47
2004 (jinsoo@cs.kaist.ac.kr)
48
28
cond
25
000
21
19
opcode S
16
Rn
12
Rd
Rm
28
cond
25
000
21
19
opcode S
16
Rn
12
Rd
Rs
0 shift
Rm
28
cond
25
001
21
opcode S
19
16
Rn
2004 (jinsoo@cs.kaist.ac.kr)
12
Rd
rotate
immediate-8
49
2004 (jinsoo@cs.kaist.ac.kr)
50
Programming model
Programming model: registers visible to the programmer.
Some registers are not visible (IR).
2004 (jinsoo@cs.kaist.ac.kr)
51
Multiple implementations
Successful architectures have several implementations:
2004 (jinsoo@cs.kaist.ac.kr)
52
IC Technology
IC technology
The manner in which a digital (gate-level) implementation is
mapped onto an IC
IC package
IC
source
gate
oxide
channel
drain
Silicon substrate
2004 (jinsoo@cs.kaist.ac.kr)
54
IC technology
Three types of IC technologies
Full-custom/VLSI
Semi-custom ASIC (gate array and standard cell)
PLD (Programmable Logic Device)
2004 (jinsoo@cs.kaist.ac.kr)
55
Outline
Anatomy of integrated circuits
2004 (jinsoo@cs.kaist.ac.kr)
56
CMOS transistor
Source, Drain
Gate
Oxide
2004 (jinsoo@cs.kaist.ac.kr)
57
IC package
IC
source
gate
oxide
channel
drain
Silicon substrate
2004 (jinsoo@cs.kaist.ac.kr)
58
NMOS Inverter
AS
n+
Source, drain
n+
2004 (jinsoo@cs.kaist.ac.kr)
59
NMOS Inverter
n+
n+
contact
aluminum ,
n+
n+
Length unit --- l
(micron)
2l
2004 (jinsoo@cs.kaist.ac.kr)
60
NAND
2004 (jinsoo@cs.kaist.ac.kr)
61
Tape out
Spin
Photolithography
2004 (jinsoo@cs.kaist.ac.kr)
62
Full Custom
Very Large Scale Integration (VLSI)
Placement
Place and orient transistors
Routing
Connect transistors
Sizing
Make fat, fast wires or thin, slow wires
May also need to size buffer
Design Rules
2004 (jinsoo@cs.kaist.ac.kr)
63
Full Custom
Best size, power, performance
Hand design
Horrible time-to-market/flexibility/NRE cost
Reserve for the most important units in a processor
ALU, Instruction fetch
2004 (jinsoo@cs.kaist.ac.kr)
64
Semi-Custom
Gate Array
Array of prefabricated gates place and route
Higher density, faster time-to-market
Does not integrate as well with full-custom
Standard Cell
2004 (jinsoo@cs.kaist.ac.kr)
65
Semi-Custom
Most popular design style
Master of none
Integrate with full custom for critical
regions of design
2004 (jinsoo@cs.kaist.ac.kr)
66
Benefits
Drawback
2004 (jinsoo@cs.kaist.ac.kr)
67
2004 (jinsoo@cs.kaist.ac.kr)
68
Xilinx FPGA
2004 (jinsoo@cs.kaist.ac.kr)
69
2004 (jinsoo@cs.kaist.ac.kr)
70
I/O Block
2004 (jinsoo@cs.kaist.ac.kr)
71
IC technology
The manner in which a digital (gate-level) implementation is
mapped onto an IC
IC package
IC
source
gate
oxide
channel
drain
Silicon substrate
2004 (jinsoo@cs.kaist.ac.kr)
72
Full-custom/VLSI
All layers are optimized for an embedded systems particular
digital implementation
Placing transistors
Sizing transistors
Routing wires
Benefits
Excellent performance, small size, low power
Drawbacks
High NRE cost (e.g., $300k), long time-to-market
2004 (jinsoo@cs.kaist.ac.kr)
73
Semi-custom
Lower layers are fully or partially built
Designers are left with routing of wires and maybe placing some
blocks
Benefits
Good performance, good size, less NRE cost than a full-custom
implementation (perhaps $10k to $100k)
Drawbacks
Still require weeks to months to develop
2004 (jinsoo@cs.kaist.ac.kr)
74
Benefits
Low NRE costs, almost instant IC availability
Drawbacks
Bigger, expensive (perhaps $30 per unit), power hungry, slower
2004 (jinsoo@cs.kaist.ac.kr)
75
Structured ASIC
From the Paper,
Paradigm shift in ASIC technology
In Standard Metal
Out Standard Cell,
Zvi Or-Bach, eASIC founder and CEO
Structured ASIC
About 20 years ago
Full custom design Standard Cell
Design cost of Full custom : $10 million
Today
Standard Cell : exceeds $10 million
2004 (jinsoo@cs.kaist.ac.kr)
77
2004 (jinsoo@cs.kaist.ac.kr)
78
Definition
Structured ASIC
Key to reducing design cost and complexity
Reducing number of custom mask and via layers
Typically, two or three (sometimes 5) user-modifiable metal
layers
Multiple input lookup tables, F/Fs, and MUXs
2004 (jinsoo@cs.kaist.ac.kr)
79
2004 (jinsoo@cs.kaist.ac.kr)
80
Interconnection
At 100 nm
Interconnect switching energy = TR switching energy x 3
At 35 nm, 30 times greater
Crosstalk
2004 (jinsoo@cs.kaist.ac.kr)
81
2004 (jinsoo@cs.kaist.ac.kr)
82
Paradigm Shift
Transistor sizing (Full custom) gate sizing (Standard Cell)
Move to an even coarser building block
2004 (jinsoo@cs.kaist.ac.kr)
83
2004 (jinsoo@cs.kaist.ac.kr)
84
2004 (jinsoo@cs.kaist.ac.kr)
85
Natural solution for this yield: use repetitive patterns, just as in SRAM
2004 (jinsoo@cs.kaist.ac.kr)
86
2004 (jinsoo@cs.kaist.ac.kr)
87
2004 (jinsoo@cs.kaist.ac.kr)
88
General,
providing improved:
Generalpurpose
processor
ASIP
Singlepurpose
processor
Flexibility
Maintainability
NRE cost
Time- to-prototype
Time-to-market
Cost (low volume)
Customized,
providing improved:
Power efficiency
Performance
Size
Cost (high volume)
PLD
2004 (jinsoo@cs.kaist.ac.kr)
Semi-custom
Full-custom
89