You are on page 1of 50

Chapter 5 A Closer Look at

Instruction Set Architectures


Lecturer: Hao Zheng
Comp Sci & Eng, USF

Chapter 5 Objectives
Wouldnt it be more convenient to have in MARIE
Add 1 (in addition to Add One)?
Why an ISA includes some instructions but not the
others?

Overview the factors involved in instruction set


architecture design.
Gain familiarity with more memory addressing
modes.

5.2 Instruction Formats


Instruction sets are differentiated by the following:
Number of bits per instruction.
Operand locations (how CPU accesses data).
Internal storage: Stack-based or register-based.
Number of explicit operands per instruction.
Types of operations.
Type and size of operands.
Memory addresses, registers, or actual data

5.2 Instruction Formats


Instruction set architectures are measured according
to:
Main memory space occupied by a program.
Code density

Instruction complexity.
Affect the amount of decoding, thus performance.

Instruction length (in bits): variable or fixed.


Total number of instructions in the instruction set.

5.2 Instruction Formats

Opcode

Operand 1

Operand 2

Operand 3

Actual data
Mem. addr. to actual data
Register holding actual data
Reg/mem addr holding addr to
actual data
5

5.2.1 Instruction Formats


In designing an instruction set, consideration is given
to:
Instruction length word length.
Short/long, or fixed/variable.

Number of operands for each instruction.


Number of addressable registers to instructions.
Memory organization.
Whether byte- or word addressable.

Addressing modes.
Choose any or all: direct, indirect or indexed, etc.

5.2.2 Instruction Formats


Byte ordering, or endianness, is another major
architectural consideration.
How multi-byte words are stored in byte-addressable mem.

If we have a two-byte integer, the integer may be stored so


that the least significant byte is followed by the most
significant byte or vice versa.
In little endian machines, the least significant byte is followed by
the most significant byte.
Big endian machines store the most significant byte first (at the
lower address).

5.2.2 Instruction Formats


As an example, suppose we have the
hexadecimal number 0A0B0C0D.
The little endian and big endian arrangements of
the bytes are shown below.

Source: wiki

5.2.3 Internal Storage in the CPU


The next consideration for architecture design
concerns how the CPU will store data internally.
Consideration is based on the fact that the RAM is
much slower than registers.
We have three choices:
1. A stack architecture
2. An accumulator architecture
3. A general purpose register architecture.
In choosing one over the other, the tradeoffs are
simplicity (and cost) of hardware design with
execution speed and ease of use.
9

5.2.3 Internal Storage in the CPU


In an accumulator architecture, one operand of a
binary operation is implicitly in the accumulator.
One operand is in memory, creating lots of bus traffic.

10

5.2.3 Internal Storage in the CPU


In a stack architecture, instructions and operands are
implicitly taken from the stack.
A stack can only be accessed through its top.
BR1

BR2

ALU

Stack
MAR

PC

IR
Control

11

Main
Memory

5.2.3 Internal Storage in the CPU


General purpose register (GPR) architecture.
-Faster than accumulator
architecture.
- Efficient implementation
for compilers.
- Results in longer
instructions as all operands
must have names.

ALU

Register
File

MAR
PC

IR
Control

12

Main
Memory

5.2.3 Internal Storage in the CPU


Most systems today are GPR systems.
There are three types:
Memory-memory gets all operands from memory.
Register-memory gets operands from register/memory.
Load-store requires all operands loaded into registers first.

The number of operands and the number of available


registers has a direct affect on instruction length.
The more operands specified in an instruction, the longer it is.
The more registers, the longer the operands.

Opcode
13

Operand 1

Operand 2

Operand 3

5.2.4 Num. of Operands and Inst Len.


Instruction length issue
Fixed waste memory space, but lead to better
performance.
Ex. Clear in MARIE.

Variable variable length of opcodes or the number of


operands,
More memory efficient, but requires complex decoding.

Question: can all instructions have zero operands?


14

5.2.4 Num. of Operands and Inst Len.


In general, an instruction includes 0, 1, 2, 3 operands.
Stack machines use one - and zero-operand
instructions.
A stack is a first-in-last-out data structure.

PUSH and POP operations involve only the stacks top


element.

Instructions use operands from the stack implicitly.


push X loads data at memory location X onto the top of the
stack.
pop X stores the top of the stack at memory location X.

Binary instructions (e.g., ADD, MULT) use the top two


items on the stack.
15

5.2.4 Recall the Stack-Based Arch.

BR1

BR2

ALU

Stack
MAR

PC

IR
Control

16

Main
Memory

5.2.4 Num. of Operands and Inst Len.


Stack architectures require us to think about
arithmetic expressions a little differently.
We are accustomed to writing expressions using
infix notation, such as: Z = X + Y.
Stack arithmetic requires that we use postfix
notation: Z = XY+.
This is also called reverse Polish notation, (somewhat)
in honor of its Polish inventor, Jan Lukasiewicz (1878 1956).

17

5.2.4 Num. of Operands and Inst Len.


The principal advantage of postfix notation is that
parentheses are not used.
For example, the infix expression,

Z = (X Y) + (W U),
becomes:

Z = X Y W U +
in postfix notation.

18

5.2.4 Num. of Operands and Inst Len.


Example: Convert the infix expression (2+3) - 6/3
to postfix:
The sum 2 + 3 in parentheses takes

2 3+ - 6/3

precedence; we replace the term with


2 3 +.

19

5.2.4 Num. of Operands and Inst Len.


Example: Convert the infix expression (2+3) - 6/3
to postfix:
The division operator takes next precedence;

2 3+ - 6 3/

we replace 6/3 with


6 3 /.

20

5.2.4 Num. of Operands and Inst Len.


Example: Convert the infix expression (2+3) - 6/3
to postfix:

2 3+ 6 3/ -

The quotient 6/3 is subtracted from the sum of


2 + 3, so we move the - operator to the end.

21

5.2.4 Num. of Operands and Inst Len.


Example: Use a stack to evaluate the postfix
expression 2 3 + 6 3 / - :

Scanning the expression from

2 3 + 6 3

left to right, push operands onto


the stack, until an operator is

found

22

/ -

5.2.4 Num. of Operands and Inst Len.


Example: Use a stack to evaluate the postfix
expression 2 3 + 6 3 / - :

Pop the two operands and carry

2 3 + 6 3

out the operation indicated by


the operator. Push the result
back on the stack.

23

/ -

5.2.4 Num. of Operands and Inst Len.


Example: Use a stack to evaluate the postfix
expression 2 3 + 6 3 / - :

2 3 + 6 3
Push operands until another

operator is found.

6
5

24

/ -

5.2.4 Num. of Operands and Inst Len.


Example: Use a stack to evaluate the postfix
expression 2 3 + 6 3 / - :

2 3 + 6 3
Carry out the operation and
push the result.

25

2
5

/ -

5.2.4 Num. of Operands and Inst Len.


Example: Use a stack to evaluate the postfix
expression 2 3 + 6 3 / - :
Finding another operator, carry

2 3 + 6 3

out the operation and push the


result.
The answer is at the top of the
stack.

26

/ -

5.2.4 Num. of Operands and Inst Len.


Example: now you evaluate the following expression
using stack.

27

5.2.4 Num. of Operands and Inst Len.


Let's see how to evaluate how different instruction
formats affect the program complexity.
With a three-address ISA, (e.g.,mainframes), the infix
expression,

Z = X Y + W U
might look like this:
MULT R1,
MULT R2,
ADD Z,

28

X,
W,
R1,

Y
U
R2

5.2.4 Num. of Operands and Inst Len.


In a two-address ISA, (e.g.,Intel, Motorola), the infix
expression,

Z = X Y + W U
might look like this:
LOAD R1,
MULT R1,
LOAD R2,
MULT R2,
ADD R1,
STORE Z,

29

X
Y
W
U
R2
R1

Note: One-address
ISAs usually
require one
operand to be a
register.

5.2.4 Num. of Operands and Inst Len.


In a one-address ISA, like MARIE, the infix
expression,

Z = X Y + W U
looks like this:
LOAD
MULT
STORE
LOAD
MULT
ADD
STORE

30

X
Y
TEMP
W
U
TEMP
Z

5.2.4 Num. of Operands and Inst Len.


In a stack ISA, the postfix expression,

Z = X Y W U +
might look like this:
PUSH
PUSH
MULT
PUSH
PUSH
MULT
ADD
PUSH

31

X
Y
W
U
Z

Would this program


require more execution
time than the
corresponding (shorter)
program that we saw in
the 3-address ISA?

5.3 Instruction types


Instructions fall into several broad categories that you
should be familiar with:
Data movement.
Arithmetic (integer, real, etc).
Boolean.
Can you think of
Bit manipulation.
some examples
I/O.
of each of these?
Control transfer.
Special purpose.

ISA should not contain non-essential instructions as


executing additional instructions adds to circuit complexity
32

5.4 Addressing
Addressing modes specify where an operand is
located.
The operands in an instructions can be the actual data,
register/memory location holding the data, or
registers/memory location holding address to a mem
location that holds the actual data.

They can specify a constant, a register, or a memory


location.
The actual address to the memory location of an
operand is its effective address.
Certain addressing modes allow us to determine the
address of an operand dynamically.
33

5.4.2 Addressing Mode


Immediate addressing is where the data is part of
the instruction.
Ex.: Load 008 where 008 is the actual data

Direct addressing is where the address of the data is


given in the instruction.
Ex.: Load 008 where 008 is a memory address to the data.

Register addressing is where the data is located in a


register.
Suppose there are 8 general-purpose registers R0 R7.
Add R0, R1, R2 where operands are in R1 and R2, and R0
is the destination register for the result of the Add operation.

34

5.4.2 Addressing Mode


Indirect addressing gives the address of the address
of the data in the instruction.
Ex.: LoadI 008 where location at 008 stores the address to
the actual operand.

Register indirect addressing uses a register to store


the address of the address of the data.
Ex.: LoadI R1 where R1 stores the address to the actual
operand.

35

5.4.2 Addressing Mode


Indexed addressing uses an index register (implicitly or
explicitly) as an offset, which is added to the address in the
operand to determine the effective address of the data.
Suppose R1 is the index register.
Ex.: Load X loads operand at address X + R1.
Based addressing is similar except that a base register is
used instead of an index register.
Base register holds the base address.

They are useful for accessing arrays.

36

5.4.2 Addressing Mode


In stack addressing the operand is assumed to be
on top of the stack.
There are many variations to these addressing
modes including:

Indirect indexed.
Base/offset.
Self-relative
Auto increment - decrement.

We wont cover these in detail.


Lets look at an example of the principal addressing modes.
37

5.4.2 Addressing Mode


For the instruction shown, what value is loaded into
the accumulator for each addressing mode?

38

5.4.2 Addressing Mode


These are the values loaded into the accumulator
for each addressing mode.

39

Chapter 5 Conclusion
ISAs are distinguished according to their bits per
instruction, number of operands per instruction, operand
location and types and sizes of operands.
Instruction format: an opcode with zero or several
operands.
Endianness as another major architectural consideration.
CPU can store data internally based on
1. A stack architecture
2. An accumulator architecture
3. A general purpose register architecture.

40

Chapter 5 Conclusion
Instructions can be fixed length or variable length.
Typically vary in the number of operands.
Size of opcodes can vary too.

The addressing mode of an ISA is also another


important factor. We looked at:

41

Immediate
Register
Indirect
Based

Direct
Register Indirect
Indexed
Stack

End of Chapter 5

42

5.5 Instruction Pipelining

43

5.5 Instruction Pipelining

44

5.5 Instruction Pipelining


Some CPUs divide the fetch-decode-execute cycle
into smaller steps.
These smaller steps can often be executed in parallel
to increase throughput.
Such parallel execution is called instruction
pipelining.
Instruction pipelining provides for instruction level
parallelism (ILP)

45

5.5 Instruction Pipelining


Suppose a fetch-decode-execute cycle were broken
into the following smaller steps:
1. Fetch instruction.
2. Decode opcode.
3. Calculate effective
address of operands.

4. Fetch operands.
5. Execute instruction.
6. Store result.

Suppose we have a six-stage pipeline. S1 fetches


the instruction, S2 decodes it, S3 determines the
address of the operands, S4 fetches them, S5
executes the instruction, and S6 stores the result.

46

5.5 Instruction Pipelining

47

5.5 Instruction Pipelining


Theoretical max speedup by an example: a program with
100 instructions. How many clock cycles are needed to
finish this program?
Case 1: no pipelining.

Case 2: with pipelining.

48

5.5 Instruction Pipelining


First, we have to assume that the architecture
supports fetching instructions and data in parallel.
Refer to previous pipeline example where it is S1 and S4
in a clock cycle.

Second, we assume that the pipeline can be kept


filled at all times. This is not always the case.
Pipeline hazards arise that cause pipeline conflicts
and stalls.
Refer to the laundry example. What if only one power
outlet is available?
49

5.5 Instruction Pipelining


An instruction pipeline may stall, or be flushed for
any of the following reasons:
Resource conflicts.

Data dependencies.
Conditional branching.

Measures can be taken at the software level as well


as at the hardware level to reduce the effects of
these hazards, but they cannot be totally eliminated.

50