Risc

Major Advances
CSCI 4717/5717
A number of advances have occurred since the
Computer Architecture von Neumann architecture was proposed:
• Family concept – separating architecture of

Topic: RISC Processors machine from implementation
• Microprogrammed unit
Reading: Stallings, Chapter 13 – Microcode allow for simple programs to be
executed from firmware as an action for an
instruction
– Eases the task of designing and implementing the
control unit
CSCI 4717 – Computer Architecture RISC Processors – Page 1 CSCI 4717 – Computer Architecture RISC Processors – Page 2
Major Advances (continued) Semantic Gap

• Solid-state RAM • Difference between operations performed in
HLL and those provided by architecture
• Microprocessors • Example: case/switch on VAX in hardware
• Cache memory – speeds up memory • Problems
hierarchy – inefficient execution of code
• Pipelining – reduces percentage of idle – excessive machine program code size
– increased complexity of compilers
components
• Predominate operations
• Multiple processors – Speed through – Movement of data
parallelism – Conditional statements
Operations Operations (continued)

• Dynamic occurrence – relative number of times
instructions tended to occur in a compiled program
• Static occurrence – counting the number of times
they are seen in a program (This is a useless
measurement)
• Machine-Instruction Weighted – relative amount of
machine code executed as a result of this
instruction (based on dynamic occurrence)
• Memory Reference Weighted – relative amount of
memory references executed as a result of this
instruction (based on dynamic occurrence)
• Procedure call is most time consuming
1
Operands Operands (continued)
• Integer constants Pascal C Average
• Scalars (80% of scalars were local to procedure) Integer 16% 23% 20%
• Array/structure constant
• Lunde, A. "Empirical Evaluation of Some Features
of Instruction Set Processor Architectures." Scalar 58% 53% 55%
Communications of the ACM, March 1977. variable
– Each instruction references 0.5 operands in memory
– Each instruction references 1.4 registers Array/ 26% 24% 25%
– These numbers depend highly on architecture (e.g., structure
number of registers, etc.)
Procedure calls Results of Research

This research suggests:
• Trying to close semantic gap (CISC) is not
necessarily answer to optimizing processor
design
• A set of general techniques or architectural
characteristics can be developed to improve
performance.
Reduced Instruction
Increasing Register Availability
Set Computer (RISC)
Characteristics of a RISC architecture: There are two basic methods for improving
register use
• Large number of general-purpose registers and/or
use of compiler designed to optimize use of – Software – relies on compiler to maximize
registers – Saves operand referencing register usage
• Limited/simple instruction set – Will become – Hardware – simply create more registers
clearer later
• Optimization of pipeline due to better instruction
design – Due to high proportion of conditional
branch and procedure call instructions
2
Register Windows Register Windows (continued)
• The hardware solution to making more registers Solution – Create multiple sets of registers, each
available for a process is to increase the number assigned to a different procedure
of registers – Saves having to store/retrieve register values from
• Large number of registers should decrease memory
number of memory accesses – Allow adjacent procedures to overlap allowing for
• Allocate registers first to local variables parameter passing
• A procedural call will force registers to be saved Parameter Local Temporary
into fast memory registers registers registers
• As shown in Table 13.4 (slide 9), only a small
Call/return
number of parameters and local variables are
typically required
Parameter Local Temporary
registers registers registers
Register Windows (continued)

• This implies no movement of data to pass Register
parameters. Windows
• Begin to see why compiler writers would (continued)
make better processor architects
• To make number of registers appear
unbounded, architecture should allow for
older activations to be stored in memory
Register Windows (continued) Register Windows – Global Variables

• Saves occur by interrupt saving only • Question: Where do we put global
• Parameter registers and local registers. variables?
• Temporary registers are associated with • Could set global variables in memory
parameter registers of next call • For often accessed global variables,
• N-window register file can only hold N-1 however, this is inefficient
procedure activations • Solution: Create an additional set of
• Research showed that N=8 Æ 1% save or registers for global variables. (Fixed number
restore of the calls and returns. and available to all procedures)
3
Problems with Register Windows Register Windows versus Cache
• Increased hardware burden • It could be said that register windows are
• Compiler needs to determine which similar to a high-speed memory or cache for
variables get the nice, high-speed registers procedure data
and which go to memory • This is not necessarily a valid comparison
Register Windows versus Cache Register Windows versus Cache

(continued) (continued)
There are some areas where caches are
more efficient
– They contain data that is definitely used
– Register file may not be fully used by procedure
– Savings in other areas such as code accesses
are possible with cache whereas register file
only works with local variables
Register Windows versus Cache

Compiler-based register optimisation
(continued)
• There are, however, some areas where the • Assume a reduced number of available registers
register windows are a better choice • HLL do not use explicit references to registers
– Register file more closely mimics software • Solution
– Assign symbolic or virtual register designations to each
which typically operates within a narrow range declared variable
of procedure calls whereas caches may thrash – Map limited registers to symbolic registers
under certain circumstances – Symbolic registers that do not overlap using share same
register
– Register file wins the speed war when it comes – Load-and-store operations for quantities that overflow
to decoding logic number of available registers
• Solution – use register file and instructions- • Goal is to decide which quantities are to be
assigned registers at any given point in program –
only cache Graph coloring
4
Graph Coloring Graph Coloring (continued)
• Technique borrowed from discipline of topology
• Create graph – Register Interference Graph
– Each node is a symbolic register
– Two symbolic registers that used during the same
program fragment are joined by an edge to depict
interference
– Two symbolic nodes linked must have different "colors“
– Goal is to avoid "number of colors" exceeding number of
available registers
– Symbolic registers that go past number of actual
registers must be stored in memory
CISC versus RISC CISC versus RISC (continued)

• Complex instructions are possibly more CISC programs may take less memory
difficult to directly associate w/a HLL • Not necessarily an advantage with cheap
instruction – many compilers may just take memory
the simpler, more reliable way out • Is an advantage due to fewer page faults
• Optimization more difficult with complex • May only be shorter in assembly language
instructions view, not necessarily from the point of view
• Compilers tend to favor more general, of the number of bits
simpler commands, so savings in terms of
speed may not be realized either
Additional Design Distinctions RISC – One Instruction per Cycle

• Further characteristics of RISC • Cycle = machine cycle
– One instruction per cycle • Fetch two operands from registers – very simple
– Register-to-register operations addressing mode
– Simple addressing modes • Perform an ALU operation
– Simple instruction formats • Store the result in a register
• Microcode should not be necessary at all –
• There is no clear-cut design for one or the
hardwired code
other
• Format of instruction is fixed and simple to decode
• Many processors contain characteristics of • Burden is placed on compiler rather than
both RISC and CISC processor
5
RISC – Register-to-Register Operations Simple addressing modes
• Only LOAD and STORE operations should • Register
access memory • Displacement
• ADD Example: • PC-relative
– RISC – ADD and ADD with carry • No indirect addressing – requires two
– VAX – 25 different ADD instructions memory accesses
• No more than one memory addressed
operand per instruction
• Unaligned addressing not allowed
• Simplifies control unit
Simple instruction formats Characteristics of Some Processors

• Instruction length is fixed – typically 4 bytes
• One or a few formats are used
• Instruction decoding and register operand
decoding can occur at the same time
• Simplifies control unit
RISC Pipelining Comparing the Effects of Pipelining

Sequential execution – obviously inefficient
• Pipelining structure is simplified greatly thus
making delay between stages much less apparent
and simplifying logic of the stages
• ALU operations
– I: instruction fetch
– E: execute (register-to-register)
• Load and store operations
– I: instruction fetch
– E: execute (register-to-register)
– D: Memory (register-to-memory or memory-to-register
operations)
6
Comparing the Effects of Pipelining Comparing the Effects of Pipelining
(continued) (continued)
• Two-way pipelined timing – I and E stages of two different Permitting two memory accesses at one time
instructions can be performed simultaneously allows for fully pipelined operation (dual-port RAM)
• Yields up to twice the execution rate of sequential
• Problems
– Causes wait state
with accesses to
memory
– Branch disrupts flow
(NOOP instruction
can be inserted by
assembler or
compiler)
Comparing the Effects of Pipelining

(continued)
Delayed Branch
• Since E is usually longer, break E into two parts • Traditional pipelining disposes of instruction
– E1 – register file read loaded in pipe after branch
– E2 – ALU operation and register write
• Because of RISC design, • Delayed branching executes instruction
this is not as difficult to loaded in pipe after branch
do and up to four • NOOP can be used if instruction cannot be
instructions can be under
way at one time
found to execute after JUMP. This makes it
(potential speedup of 4) so no special circuitry is needed to clear the
pipe.
• It is left up to the compiler to rearrange
instructions or add NOOPs
Delayed Branch (continued)
Delayed
Branch
(continued)
7
Problem 13.5 from Textbook Delayed Load
S := 0;
for K :=1 to 100 do S := S – K;
• Similar to delayed branch in that an
instruction that doesn't use register being
loaded can execute during the D phase of a
-- translates to --
load instruction
• During a load, processor “locks” register
LD R1, 0 ;keep value of S in R1
being loaded and continues execution until
LD R2, 1 ;keep value of K in R2
instruction requiring locked register is
LP SUB R1, R1, R2 ;S := S – K referenced
BEQ R2, 100, EXIT ;done if K = 100
• Left up to the compiler to rearrange
ADD R2, R2, 1 ;else increment K
instructions
JMP LP ;back to start of loop

Risc

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Risc

Uploaded by

Copyright:

Available Formats

Major Advances

• Family concept – separating architecture of

Major Advances (continued) Semantic Gap

Operations Operations (continued)

Procedure calls Results of Research

Register Windows (continued)

Register Windows (continued) Register Windows – Global Variables

Register Windows versus Cache Register Windows versus Cache

Register Windows versus Cache

CISC versus RISC CISC versus RISC (continued)

Additional Design Distinctions RISC – One Instruction per Cycle

Simple instruction formats Characteristics of Some Processors

RISC Pipelining Comparing the Effects of Pipelining

Comparing the Effects of Pipelining

Delayed Branch (continued)

You might also like