Professional Documents
Culture Documents
CSCI 4717/5717
A number of advances have occurred since the
Computer Architecture von Neumann architecture was proposed:
CSCI 4717 – Computer Architecture RISC Processors – Page 3 CSCI 4717 – Computer Architecture RISC Processors – Page 4
1
Operands Operands (continued)
• Integer constants Pascal C Average
• Scalars (80% of scalars were local to procedure) Integer 16% 23% 20%
• Array/structure constant
• Lunde, A. "Empirical Evaluation of Some Features
of Instruction Set Processor Architectures." Scalar 58% 53% 55%
Communications of the ACM, March 1977. variable
– Each instruction references 0.5 operands in memory
– Each instruction references 1.4 registers Array/ 26% 24% 25%
– These numbers depend highly on architecture (e.g., structure
number of registers, etc.)
CSCI 4717 – Computer Architecture RISC Processors – Page 7 CSCI 4717 – Computer Architecture RISC Processors – Page 8
CSCI 4717 – Computer Architecture RISC Processors – Page 9 CSCI 4717 – Computer Architecture RISC Processors – Page 10
Reduced Instruction
Increasing Register Availability
Set Computer (RISC)
Characteristics of a RISC architecture: There are two basic methods for improving
register use
• Large number of general-purpose registers and/or
use of compiler designed to optimize use of – Software – relies on compiler to maximize
registers – Saves operand referencing register usage
• Limited/simple instruction set – Will become – Hardware – simply create more registers
clearer later
• Optimization of pipeline due to better instruction
design – Due to high proportion of conditional
branch and procedure call instructions
CSCI 4717 – Computer Architecture RISC Processors – Page 11 CSCI 4717 – Computer Architecture RISC Processors – Page 12
2
Register Windows Register Windows (continued)
• The hardware solution to making more registers Solution – Create multiple sets of registers, each
available for a process is to increase the number assigned to a different procedure
of registers – Saves having to store/retrieve register values from
• Large number of registers should decrease memory
number of memory accesses – Allow adjacent procedures to overlap allowing for
• Allocate registers first to local variables parameter passing
• A procedural call will force registers to be saved Parameter Local Temporary
into fast memory registers registers registers
• As shown in Table 13.4 (slide 9), only a small
Call/return
number of parameters and local variables are
typically required
Parameter Local Temporary
registers registers registers
CSCI 4717 – Computer Architecture RISC Processors – Page 13 CSCI 4717 – Computer Architecture RISC Processors – Page 14
CSCI 4717 – Computer Architecture RISC Processors – Page 15 CSCI 4717 – Computer Architecture RISC Processors – Page 16
CSCI 4717 – Computer Architecture RISC Processors – Page 17 CSCI 4717 – Computer Architecture RISC Processors – Page 18
3
Problems with Register Windows Register Windows versus Cache
• Increased hardware burden • It could be said that register windows are
• Compiler needs to determine which similar to a high-speed memory or cache for
variables get the nice, high-speed registers procedure data
and which go to memory • This is not necessarily a valid comparison
CSCI 4717 – Computer Architecture RISC Processors – Page 19 CSCI 4717 – Computer Architecture RISC Processors – Page 20
CSCI 4717 – Computer Architecture RISC Processors – Page 21 CSCI 4717 – Computer Architecture RISC Processors – Page 22
CSCI 4717 – Computer Architecture RISC Processors – Page 23 CSCI 4717 – Computer Architecture RISC Processors – Page 24
4
Graph Coloring Graph Coloring (continued)
• Technique borrowed from discipline of topology
• Create graph – Register Interference Graph
– Each node is a symbolic register
– Two symbolic registers that used during the same
program fragment are joined by an edge to depict
interference
– Two symbolic nodes linked must have different "colors“
– Goal is to avoid "number of colors" exceeding number of
available registers
– Symbolic registers that go past number of actual
registers must be stored in memory
CSCI 4717 – Computer Architecture RISC Processors – Page 25 CSCI 4717 – Computer Architecture RISC Processors – Page 26
CSCI 4717 – Computer Architecture RISC Processors – Page 27 CSCI 4717 – Computer Architecture RISC Processors – Page 28
CSCI 4717 – Computer Architecture RISC Processors – Page 29 CSCI 4717 – Computer Architecture RISC Processors – Page 30
5
RISC – Register-to-Register Operations Simple addressing modes
• Only LOAD and STORE operations should • Register
access memory • Displacement
• ADD Example: • PC-relative
– RISC – ADD and ADD with carry • No indirect addressing – requires two
– VAX – 25 different ADD instructions memory accesses
• No more than one memory addressed
operand per instruction
• Unaligned addressing not allowed
• Simplifies control unit
CSCI 4717 – Computer Architecture RISC Processors – Page 31 CSCI 4717 – Computer Architecture RISC Processors – Page 32
CSCI 4717 – Computer Architecture RISC Processors – Page 33 CSCI 4717 – Computer Architecture RISC Processors – Page 34
CSCI 4717 – Computer Architecture RISC Processors – Page 35 CSCI 4717 – Computer Architecture RISC Processors – Page 36
6
Comparing the Effects of Pipelining Comparing the Effects of Pipelining
(continued) (continued)
• Two-way pipelined timing – I and E stages of two different Permitting two memory accesses at one time
instructions can be performed simultaneously allows for fully pipelined operation (dual-port RAM)
• Yields up to twice the execution rate of sequential
• Problems
– Causes wait state
with accesses to
memory
– Branch disrupts flow
(NOOP instruction
can be inserted by
assembler or
compiler)
CSCI 4717 – Computer Architecture RISC Processors – Page 37 CSCI 4717 – Computer Architecture RISC Processors – Page 38
CSCI 4717 – Computer Architecture RISC Processors – Page 39 CSCI 4717 – Computer Architecture RISC Processors – Page 40
Delayed
Branch
(continued)
CSCI 4717 – Computer Architecture RISC Processors – Page 41 CSCI 4717 – Computer Architecture RISC Processors – Page 42
7
Problem 13.5 from Textbook Delayed Load
S := 0;
for K :=1 to 100 do S := S – K;
• Similar to delayed branch in that an
instruction that doesn't use register being
loaded can execute during the D phase of a
-- translates to --
load instruction
• During a load, processor “locks” register
LD R1, 0 ;keep value of S in R1
being loaded and continues execution until
LD R2, 1 ;keep value of K in R2
instruction requiring locked register is
LP SUB R1, R1, R2 ;S := S – K referenced
BEQ R2, 100, EXIT ;done if K = 100
• Left up to the compiler to rearrange
ADD R2, R2, 1 ;else increment K
instructions
JMP LP ;back to start of loop
CSCI 4717 – Computer Architecture RISC Processors – Page 43 CSCI 4717 – Computer Architecture RISC Processors – Page 44