You are on page 1of 28

Designing an Instruction Set

move
add
add
move
rotate
...

Nerd Chef at work.

flour,bowl
milk,bowl
egg,bowl
bowl,mixer
mixer

Handouts: Lecture Notes, Lab 4, PS4 solutions


6.004 - Spring 2000

3/9/00

Designing an Instruction Set

Factorial Engine
1
ASEL
ALE

0
1
L.E. A

ANSWER

N
BSEL
BLE

0
1
L.E. B

Todays big idea:


general purpose computer architecture

=0?

One set of UNIVERSAL Data paths

ASEL
BSEL
ALE
BLE

Z S | S Asel Bsel Ale Ble


----+--------------------- 0 | 1
1
0
1
1
0 1 | 1
0
1
1
1
1 1 | 2
0
1
1
1
- 2 | 2
0
0

6.004 - Spring 2000

-1

Control
FSM

The same data paths could compute


N*(N-1), Factorial, ..... only difference:
information in control ROM.

ENCODED sequence of operational


steps dictate specific function to be
performed...
the PROGRAM!
New Issue:

A=1, B=N
A=A*B, B=B-1

HOW to encode the Program?

done

3/9/00

Designing an Instruction Set

Machine Language Abstraction


General Purpose Approach 1:
Documentation:
circuit diagram
Programming Tools:
data path diagram, device specs, scope probe

General Purpose Approach 2:


ABSTRACT description of machine:
set of scratch-pad locations: R0, R1, ...
set of operations: ADD, MPY, XOR, ...
Coded sequence of INSTRUCTIONS:
R1 R2 + R3, etc
Choose Approach 2; view hardware as an interpreter for coded instructions!
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

Internal storage

Anatomy of an Interpreter
Data
Paths

control
status

Control
Unit

data

instructions

MEMORY
+1

dest

PC

1101000111011
R1 R2+R3

INSTRUCTIONS coded as binary data

asel

bsel

fn

6.004 - Spring 2000

ALU

Ccs

PROGRAM COUNTER or PC: Address of


next instruction to be executed
logic to translate instructions into
control signals for data path
3/9/00

Designing an Instruction Set

Questions to be answered:
Data path questions:
how much internal storage?
what are the ALU functions?
provision for constant operands?
how does data get to/from memory?
width (in bits) are the registers/ALU?
Control unit questions:
how should instructions be encoded?
low-level (eg, ctl signals for data path)

next

fn

dest asel bsel

high-level (eg, fill polygon)


Huffman encoded (so commonly-used insts are short)
etc., etc., etc.

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

Long-term Goals
dont get steamrollered by changes in technology
density (2x every 1.5 years)
speed (2x every 3 years)
access time (30% in 10 yr)

allow for many possible implementations


slow/inexpensive
fast/not-so-cheap

good target for compilers


people dont write instructions, compilers write instructions

find right level for instruction set semantics


too low-level: useful execution info compiled out (e.g., parallelism)
too high-level: hardware interpreter is slow/complex/expensive
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

NREGS = 0?
Why not fetch operands and store results directly from/to main memory?
memories are large sizeof(address) = sizeof(data)
For example, in a 32-bit architecture C = A + B would require
96+ bits of instruction to produce a single 32-bit result!
memories have long access latency
A 1GHz processor can execute 40 instructions in the time it takes
to get data from a memory with 40ns access time
most memories arent designed for parallel operation
N operands N memory accesses
values are often used shortly after they are computed then discarded
Dont bother to hang up the clothes youre about to put on

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

NREGS = 1?
An accumulator is a specially-designated register that supplies
one instruction operand and receives the result.
implicit use of accumulator saves instruction bits
On an accumulator machine C = A + B might be implemented as
LOAD(A)
ADD(B)
STORE(C)

// load memory locn A into accumulator


// add memory locn B to accumulator
// store accumulator into memory locn C

saves one or two memory accesses (operand fetch, result store)

Result is ready for immediate reuse, but has to be saved in memory if


next computation doesnt use it right away

all but first operand must still come from memory


Memory latency problem improved but not eliminated
many computations generate more than one intermediate result
Need more than one scratch-pad location, e.g., a*b + c*d

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

NREGS =

A stack is a data structure that implements a last-in, first-out (LIFO)


access policy
Add an entry to the stack with a PUSH(value)
Remove an entry from the stack with a POP()
stacks are an elegant abstraction
On a stack machine C = A + B might be implemented as
PUSH(A)
PUSH(B)
ADD
POP(C)

// overhead!
// operators POP operand(s), PUSH result(s)

simple implementation (memory array and a stack pointer) suffers


from latency problems mentioned before. Faster implementations
(using registers to hold top stack locations) too complex?
unlimited capacity but direct access is limited to top of stack
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

NREGS = small integer?


An extension of the accumulator idea: use a set of general-purpose
registers which must be explicitly named by the instruction.
flexibility of multiple operand sources, but names are short
On a stack machine C = A + B might be implemented as
LOAD(R1,A)
ADD(R1,B)
STORE(C,R1)

// load memory locn A into R1


// add memory locn B to R1
// store R1 into memory locn C

registers can be used to hold variables


Placing a variable in a register reduces memory traffic, speeds up
execution and improves code density (register names are shorter
than memory addresses).
direct access to memory data by operations
Instruction format easy to encode and yields good code density.
Implementation complicated since memory operands are
inherently slower to access than register operands.
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

10

Reg-mem vs. load-store


Load-store (aka register-register) machines use special instructions
to access memory. Other operations only have register operands.
On a load-store machine C = A + B might be implemented as
LOAD(R1,A)
LOAD(R2,B)
ADD(R3,R1,R2)
STORE(C,R3)

// load memory locn A into R1


// load memory locn B into R2
// separate destination preserves loaded values
// store R3 into memory locn C

simple, fixed-length instruction encoding, easy code generation


operate instructions take fixed time to execute

Since all operands come from registers, timing of operate instructions


is independent of memory performance and all operands can be fetched in
parallel.

separate memory addressing from access to other operands

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

11

From Hennessy & Patterson

Constant Operands

Percentage of the operations that use a constant operand

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

12

Constant Operands: how big?

From Hennessy & Patterson

Number of bits needed for an immediate value


6.004 - Spring 2000

3/9/00

Designing an Instruction Set

13

Memory Operands: layout


32 bits

Choosing order of bytes in a word:


little endian vs. big endian

3 2 1

0x100
0x104

Well choose little endian: leastsignificant byte first

0x108
a

struct { char a; short b; int c }:


fixed offset from base
a <base>
b <base + 2>
c <base + 4>
Alignment restrictions: 2K byte datum
start at address j*2K for some j

int a[100]:
calculated offset from base
a[0] <base>
a[2] <base + 8>
a[i] <base + 4*i>
6.004 - Spring 2000

8 bits

word
byte

0x10C
0x110
0x114
0x118
0x11C
0x120

a[0]

0x124

a[1]

0x128

a[2]

0x12C

a[3]

0x130
Byte addressing

3/9/00

Designing an Instruction Set

14

Memory Operands: addressing


Absolute: (constant)
Value = Mem[constant]
Use: accessing static data

Indirect (aka Register deferred): (Rx)

Memory indirect: @(Rx)


Value = Mem[Mem[Reg[x]]]
Use: access thru pointer in mem

Autoincrement: (Rx)+

Value = Mem[Reg[x]]
Use: pointer accesses

Displacement: constant(Rx)

Value = Mem[Reg[x]]; Reg[x]++


Use: sequential pointer accesses

Autodecrement: -(Rx)

Value = Mem[Reg[x] + constant]


Use: access to local variables

Indexed: (Rx + Ry)


Value = Mem[Reg[x] + Reg[y]]
Use: array accesses (base+index)

Value = Reg[X]--; Mem[Reg[x]]


Use: stack operations

Scaled: constant(Rx)[Ry]
Value = Mem[Reg[x] + c + d*Reg[y]]
Use: array accesses (base+index)

Argh! Need a cost/benefit analysis!


6.004 - Spring 2000

3/9/00

Designing an Instruction Set

15

From Hennessy & Patterson

Memory Operands: usage

Usage of different memory operand modes


6.004 - Spring 2000

3/9/00

Designing an Instruction Set

16

From Hennessy & Patterson

Memory Operands: displacements

Number of bits needed for a displacement value


6.004 - Spring 2000

3/9/00

Designing an Instruction Set

17

Choices for our architecture


32-bit data paths and storage elements
32 registers, Reg[31] always reads as 0
operate instructions:

3-register format: Reg[c] Reg[a] op Reg[b]


choose functions that can be performed in a single cycle,
use multiple instructions to implement complex operations
second operand can be sign-extended 16-bit constant

load/store instructions:

displacement addressing: address = Reg[a] + 16-bit constant


specify Reg[31] to get absolute addressing
specify constant of 0 to get indirect addressing
other address modes synthesized using operate instructions

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

18

Sneak preview: data path


6.004 BETA Processor:
Basic Implementation

JT

PCSEL

PC
A

Instruction
Memory
D

+4

Ra <20:16>

Rb: <15:11>

Rbc <25:21>

RA2SEL
C: <25:21>

Register
File

RA1
Rc <25:21>

WA
WA
RD1

RA2
WD
RD2

WE

WERF

JT

<PC>+4

C: <15:0>

ASEL

BSEL

Op Fn: <29:26>

Control
logic

A
FN

B
WD

ALU

Data Memory
Adr

6.004 - Spring 2000

3/9/00

R/W

RD

WDSEL

Designing an Instruction Set

19

Model of Computation
Processor State
PC

Main Memory

00
3

r0
r1
r2

1 0

32 bits
(4 bytes)
32 bits
next instr

r31

Fetch/Execute Loop:
Fetch Mem[PC]
PC PC + 4
Execute instruction
(may change PC!)
Repeat!

always 0

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

20

Instruction Format: ALU ops


Basic instruction format: 32-bit instruction word
OP

rc

ra

rb

unused

ADDC(ra, rb, rc)


Reg[c] Reg[a] + Reg[b]
Add the contents of ra to
the contents of rb; store
the result in rc

Alternative instruction format:


OP

rc

ra

const

16
(signed)

ADDC(ra, const, rc)


Reg[c] Reg[a] + const
Add the contents of ra to
const; store the result in
rc

Arithmetic/logical ops:
w ADD, ADDC, SUB, SUBC, MUL, MULC, DIV, DIVC
w AND, ANDC, OR, ORC, XOR, XORC
What? No SAL?
w SHL, SHR, SAR (shift left/right/right arithmetic)
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

21

Instruction Format: LD/ST

LD(ra, C, rc)

OP

rc

ra

const

16
(signed)

Reg[c] Mem[Reg[a] + C]

Fetch into rc the contents of the memory location whose


address is C plus the contents of ra
Abbreviation: LD(C, rc) for LD(R31, C, rc)

ST(rc, C, ra)

Mem[Reg[a] + C] Reg[c]

Store the contents of rc into the memory location whose


address is C plus the contents of ra
Abbreviation: ST(rc,C) for ST(rc, C, R31)
BYTE ADDRESSES, but only 32-bit word accesses to word-aligned
addresses are supported. Low two address bits are ignored!
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

22

Instruction Classes
Memory access

Flow of control

explicit: load/store insts.


implicit: memory operands
automatic address sequencing?
Vectors, raster-scan, buffers

Number crunching

conditional branches
unconditional jumps
call/return

Miscellaneous
security/OS support
input/output
synchronization of parallel
execution
saving/restoring machine
state

integer (32 or 64 bits)


floating point (IEEE-754)
logical
multimedia
MMX, Streaming SIMD, Altivec
Graphics pipeline

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

23

Instruction Mix
Gcc

6.004 - Spring 2000

espresso

Spice

Nasa7

Data
transfer

42%

31%

34%

32%

Integer
arithmetic

28%

32%

34%

36%

Logic
operations

7%

16%

4%

3%

FP data
transfer

0%

0%

7%

13%

FP
arithmetic

0%

0%

6%

13%

Flow of
control

23%

21%

14%

4%

3/9/00

Designing an Instruction Set

24

ISA Pitfalls (from H&P):


Pitfall: Ignoring Amdahls law and optimizing uncommon operations
Amdahls
AmdahlsLaw:
Law:
Execution time for entire task w/o using the enhancement

Speedup
Speedup== Execution time for entire task using enhancement where possible

Pitfall: Advertising a high-level instruction set feature specifically


oriented to support a high-level language structure
by giving too much semantic content to the instruction, the machine
designer made it possible to use the instruction only in limited
contexts
Bill Wulf, 1981, speaking about the VAX calls instruction but what about MMX?

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

25

ISA Fallacies (from H&P)


Fallacy: There is such a thing as a typical program
Avoid the temptation to optimize your
implementation using a small number of
benchmarks.
Fallacy: An architecture with flaws cannot be successful
80x86: panned by critics, commercial success
Fallacy: One can design a flawless architecture
All architecture design involves trade-offs;
as technology changes, the correct design
decision will also change
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

26

Direct-execute ISAs
MIPS =

Clock Frequency
Clocks per Instruction

2 ways to improve instructions/second execution rate:


Decrease Clocks/Instruction
Raise clock frequency (fast logic; pipelining)

Contemporary RISC processors:


Simple instruction set for speed, pipelining
Single clock period (or less!)/instruction
6.004 - Spring 2000

3/9/00

Designing an Instruction Set

27

High-level ISAs
ISA is not directly executed:
translated before execution
Java byte codes by just-in-time compilers
Pentium/Athlon convert x86 on-the-fly to RISC
emulation
Portability (eg, Scheme)
Compact implementation (eg, embedded environments)
High-level functionality (eg, Postscript)
Support for legacy architectures (eg, Symbolics Lispms)

Blur distinction between instruction set and language

6.004 - Spring 2000

3/9/00

Designing an Instruction Set

28

You might also like