Cyan 2800398239029h09fn0ivj0vcjb0

The architecture of a computer is a logical description of its components and its
basic operations.
In pure assembly language one assembly language statement corresponds to one
basic operation of the processor. When a programmer writes in assembly language
the programmer is asking for the basic operations of the processor.
Most processors endlessly repeat three basic steps. Each machine cycle results in
the execution of one machine instruction.
A modern processor performs millions of machine cycles per second.

o Fetch the Instruction. The instruction is fetched from memory.
The program counter (PC) contains the address of the instruction in
memory.
o Decode & Increment the Program Counter. The program
counter now points to the next instruction.
o Execute the Instruction. The operation asked for by the current
machine instruction is performed.
The most common use of assembly language is in programming embedded

systems. Examples are aviation electronics, communication satellites, DVD players,
robots, automobile electronics, cell phones, and game consoles.
Keeping both data and instructions in main memory is one of the characteristics of a
Von Neumann machine, the name for the basic design of most modern computers.
Digital systems are built so that the "on" "off" (binary) value is only tested at certain
times, giving the wire (or transistor, or...) a chance to change its state between
these times. This is why computer systems have a "clock" to keep all these times
synchronized. So faster clocks mean wires can be tested more times per second,
and the whole system runs faster.
Processor chips (and the computers that contain them) are often described in
terms of their clock speed. Clock speed is measured in Hertz, where one Hertz is
one clock tick per second.
MHz means megaHertz, a million clock ticks/sec., GHz means giga Hertz, a
billion clock ticks/sec.
All computers consist of components (processor, memory, controllers, video)

connected together with a bus.
Each byte of main storage has an address. Most modern processors use 32bit addresses, so there are 232 possible addresses. Think of main storage as if
it were an array:
byte[0x00000000 ... 0xFFFFFFFF] mainStorage;
Computer systems also have cache memory. Cache memory is very fast RAM that
is inside (or close to) the processor. It duplicates sections of main storage that are
heavily used by the currently running programs. Access to cache memory is much
faster than to normal main memory.
MIPS Architecture
A register is a part of the processor that holds a bit pattern. Processors have many
registers.
MIPS processors have 32 general purpose registers, each holding 32 bits. They
also have registers other than general purpose ones. MIPS instructions
are 4 bytes = 1 word = 32 bits.
The processor chip contains registers, which are electronic components
that can store bit patterns. The processor interacts with memory by
moving bit patterns between memory and its registers.
Load: bits starting at an address in memory is copied into a register
inside the processor.
Store: bits are copied from a processor register to memory at a
designated address.
Text Segment: This holds the machine language of the user

program (the text).
Data Segment: This holds the data that the program operates
on.
Stack Segment: At the top of user address space is the stack.
With high level languages, local variables and parameters are
pushed and popped on the stack
MOVE as OR with Zero
or d, $s1, $zero
Design Principles: Simplicity favours regularity, Smaller is faster, Make the

common case fast.
Instruction Types
R-Format
add, and, or, sll, sub
e.g. add $s0, $s1, $s2

Instructio
n
Add
Add
Add
Op (6
bits)
0
0
000000
Rs (5
bits)
$s1
17
10001
I-Format
addi, bne, sw, lw slti
Rt (5
bits)
$s2
18
10010
Rd (5
bits)
$s0
16
10000
Shamt (5
bits)
0
0
00000
Funct (6
bits)
32
32
100000
e.g. lw $t0, 32($s3)

Instructio
n
lw
lw
lw
Op (6
bits)
35
35
100011
Rs (5
bits)
$s3
19
10011
Inst
Lw
Op
35
Rs
Base
Sw
43
Base
Rt (5
bits)
$t0
8
01000
Immediate (16 bits)

32 (Offset)
32 (Offset)
0000000000100000
Rt
Dest
reg
Source
reg
Address
Offset
Offset
Chapter 4 Review
The performance of a computer is determined by three factors.
1. Instruction Count
2. Clock Cycle Time
3. Clock Cycles Per Instruction
I.
II.
III.
IV.
V.
All instructions start by using the program counter to supply the instruction
address to the instruction memory. The program counter is just a register that
contains the address of the next instruction to execute.
After the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
Once the values of the registers have been fetched, they can be operated on by
the ALU to compute a memory address for a load or store, to compute an
arithmetic result, or to do a compare for branch instructions.
If the instruction is arithmetic, the ALU result must be written to a register. If the
operation is a load or store, the ALU result is used as an address to store a value
from the registers or load a value from memory into the registers.
Branches require the use of the ALU output to determine the next instruction
address.
The simplicity and regularity of the MIPS instuction set simplifies the
implementation by making the execution of the three instruction classes similar. We
only need a few extra hardware components to support R-type, I-type and J-type
instructions.
For examle, all instruction types except jump use the ALU after reading the
registers. Memory-reference instructions like Load instructions use the ALU to
compute an address calculation, and arithmetic-logical instructions use the
ALU for performing calculations on two registers.
A load instruction will need to access memory to read data and write that into a
register.
A store instruciton will need to access memory to write data into memory from a
register.
The bits of the control lines select which values of the multiplexor should be turned
on, it guides the flow of the program depending on the operation type.
The datapath contains elements used to operate on or hold data within a
processor. Datapath elements include the instruction memory, data memory, the
register file, the ALU, and adders.
Load and Store ALU OP: add
Branches ALU OP: subtract
R-Type ALU OP: depends on funct number.
MemRead asserted for load instructions, tells memory to do a read.
MemWrite asserted for store instructions, tells memory to do a write.
Fetching Instructions
We must start by fetching the instruction from memory, to
prepare for executing the next instructions, we must also
increment the program counter so that it points at the
next instruction, 4 bytes later.
The program counter is a 32-bit register that is written at
the end of every clock cycles and thus does not need a
write control signal.
Register File
R-Type instructions take in two
registers and the ALU performs an
operation on them, and writes back
the result to the write register.
The register number inputs are 5
bits wide to represent any of the
32 MIPS registers.
The ALU takes two 32-bit inputs and produces a 32-bit result, as well as a 1-bit
signal if the result is zero. The 4-bit control signal of the ALU specifies which
operation will be performed on the inputs. For example, a beq instruction performs
subtraction.
Data Memory
Load and store instructions
read register operands,
calculate addresses using a 16bit offset, which must be sign
extended before sent to the
ALU.
Load: Read memory and
update register.
Store: Write register value to memory.
Branch instructions read register operands and claculate the next
target address if result is zero.
R-Type Instructions
Load Instruction
Branch On Equal Instruction
Jump Instruction
So why is a single-cycle implementation not used today?
It is inefficient, the clock cycle must have the same length for every
instruction, of course the longest possible path in the processor
determines the clock cycle. It is not feasible to vary period for different
instructions, this violates the design principle of making the common
case fast. Therefore we will improve performance by pipelining.
A single cycle implementation thus violates the design principle of
making the common case fast. We will see another datapath
implementation technique called pipelining, which has much higher
throughput. The way this is accomplished is by executing multiple
instructions simultaneously.
What if multiple instructions could be running at different stages in the

datapath?
Pipelining
A pipeline is an implementation that allows us to speed up the number of instructions we can get
done in a certain amount of time. Parallelism improves performance and we want to make the
common case fast.
The MIPS pipeline goes through five stages, the datapath cannot do more than one of the same
stages in one clock cycle, so its one step per stage.
IF
ID
EX
MEM
WB
- Instruction Fetch From Memory

- Instruction Decode & Register Read
- Execute operation or calculate address
- Access memory
- Write result back to register
Assume time for stages is the same for the instructions, to make our
calculations easier. However some stages do run faster than others, for
example load word takes the longest.
If all stages are balanced (if they all take the same time), then the speedup is
calculated by =
Speedup=
Time between instructions (non pipelined)

Time between instructions (pipelined)
The MIPS instruction set architecture was designed for pipelining.

1. All instructions are 32-bits.
a. Its easier to fetch and decode in one cycle.
2. Few and regular instruction formats.
a. The source register is located in the sample place for all
instructions.
3. Load and store addressing
a. We calculate the address in the ALU, and access memory in the
Data Memory.
4. Alignment of memory operands
a. Memory access takes only one cycle
A hazard is a situation that prevents starting the next instruction in the next cycle
of a piplined datapath.
Structure Hazards a required resource is busy.

Data Hazards need to wait for previous instruction to complete its data
read/write.
Control Hazards deciding on control action depends on previous
instruction.
When we perform a stall, we are running a NoOp operation.

The name forwarding comes from the idea that the result is passed forward from
an earlier instruction to a later instruction. Bypassing comes from passing the
result around the register file to the desired unit.
Simple Data Hazards

An instruction depends on completion of data access by a previous instruction.
add $s0, $t0, $t1

sub $t2, $s0, $t3
Without forwarding:
With forwarding:
Use the result when it is computed, dont wait for it to be stored
into a register.
o Requires extra connections in the datapath.
Load-Use Data Hazard

lw $s0, 20, $t1
sub $t2, $s0, $t3
Cant always avoid stalls by forwarding, if value not computed
when needed, then it cant be forwarded.
Without the stall, the path from memory access stage output to execution stage
input would be going backward in time, which is impossible.
For any load-use without forwarding, you would have to stall two times.
The compiler will try using instruction scheduling to avoid stalls. The code will be
reordered to avoid use of load result in the next instruction.
Although we could try to rely on compilers to remove all such hazards, the results
would not be satisfactory.
These dependences happen just too often and the delay is just too long to expect
the compiler to rescue us from this dilemma.
Structural Hazards
This occurs when a planned instruction cannot execute in the proper clock cycle
because the hardware does not support the combination of instructions that are set
to execute.
By making instruction memory separate from data memory, we can avoid

having a structure hazard so we can have multiple instructions go through.
Therefore, the instruction count can be updated at the same time we use
data memory to do load and store instructions.
Control Hazards
Occurs from the need to make a decision based on the results of one instruction
while others are executing.
Solution #1:
Stall: Just operate sequentially until the first batch is dry and then repeat until you have the right
formula. This conservative option certainly works, but it is slow.
Solution #2:
Predict outcome of branch, Only stall if prediction is wrong
Static Branch Prediction
Based on typical branch behavior.
Predict backward branches taken.

Predict fordward branches not taken.
Dynamic Branch Prediction

Hardware measures actual branch behavior (e.g. record recent history of each
branch)
Assume future behavior will continue the trend.

When wrong, stall while re-fetching, and update history.
Pipeline Summary
Pipelining improves performance by increasing instruction throughput. The way this
is done is by running multiple instructions in parallel, and each instruction has the
same latency.
Pipelining is subject to hazards, the three hazards are structural, data, and control.
Instruction set design affects the complexity of pipeline implementations.
Pipeline Registers -
pipeline.
We keep a block of storage between each state in the

Cyan 2800398239029h09fn0ivj0vcjb0

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cyan 2800398239029h09fn0ivj0vcjb0

Uploaded by

Copyright:

Available Formats

The architecture of a computer is a logical description of its components and its

A modern processor performs millions of machine cycles per second.

The most common use of assembly language is in programming embedded

All computers consist of components (processor, memory, controllers, video)

Text Segment: This holds the machine language of the user

MOVE as OR with Zero

Design Principles: Simplicity favours regularity, Smaller is faster, Make the

add, and, or, sll, sub

e.g. add $s0, $s1, $s2

addi, bne, sw, lw slti

e.g. lw $t0, 32($s3)

Immediate (16 bits)

Branch On Equal Instruction

So why is a single-cycle implementation not used today?

What if multiple instructions could be running at different stages in the

- Instruction Fetch From Memory

Time between instructions (non pipelined)

The MIPS instruction set architecture was designed for pipelining.

Structure Hazards a required resource is busy.

When we perform a stall, we are running a NoOp operation.

Simple Data Hazards

add $s0, $t0, $t1

Load-Use Data Hazard

By making instruction memory separate from data memory, we can avoid

Predict backward branches taken.

Dynamic Branch Prediction

Assume future behavior will continue the trend.

We keep a block of storage between each state in the

You might also like