Ece586 Lec5 1

Lecture Topics
ECE 486/586
Computer Architecture
Lecture # 5
Spring 2015
Portland State University
Quantitative Principles of Computer Design

Fallacies and Pitfalls
Instruction Set Principles
Introduction
Classifying Instruction Set Architectures
Reference:
Chapter 1: Sections 1.9, 1.11
Appendix A: Sections A.1, A.2
Principle #1: Exploit Parallelism
Key Principles of Computer Architecture
System Level
Take Advantage of Parallelism

Principle of Locality
Focus on the Common Case
Amdahls Law
Processor Performance Equation
Multiple processors
Multiple disks
Multiple memory channels
Pipelined buses
Processor Level
Pipelined instruction execution
Multiple functional units
Logic level
Carry lookahead adders
Multi-banked caches
Multi-ported register files
Principle #2: Exploit Locality

Temporal
Principle # 3: Focus on Common Case

Implication of Amdahls Law:
Recently accessed items likely to be accessed in the near future

Code
Loops and function calls
Data
Repeated access to the same variable, e.g., loop counter
Speeding up 90% of the execution by only 10% is as good as

speeding up 10% of the execution by 10x
Examples:
The number of add/subtract instructions in a typical program is
substantially higher than divide instructions
Focus more on building fast adders as compared to fast dividers
Spatial
Items whose addresses are near one another tend to be
referenced close together in time
Code
Most loop branches are taken

Use branch prediction (fetch the branch target instead of the next
sequential instruction)
Sequential instruction execution
Data
Array elements, fields in a data structure
Fallacies and Pitfalls

Fallacy
Fallacy
The relative performance of two processors with the same
ISA can be judged by clock rate or by the performance of a
single benchmark suite
A falsehood often widely believed to be true
Pitfall
Easily made mistake
Generalizations of principles that are true in a limited
context
Problems with the above argument:

The processors may have the same clock rate, but may differ
considerably in their pipelines and cache subsystems
Same clock rate but different CPIs
A processor may be tuned to one particular benchmark suite,

while performing poorly on other benchmarks
Fallacy
Fallacy
Benchmarks remain valid indefinitely
Peak performance tracks observed performance
Why not?
Problems with the above argument:
Vulnerability to Benchmark engineering

Once a benchmark becomes popular, there is tremendous
pressure to improve performance by bending the rules for
running the benchmark
Kernels which spend majority of their time on a very small
section of code are particularly vulnerable
Peak performance is only useful as an upper bound on the

performance that a system can deliver
Typical performance can vary 10x or more from peak
performance
Difference between typical performance and peak
performance can vary greatly from program to program
Example: matrix300 kernel
Fallacy
Fallacy
Multiprocessors are a silver bullet
Synthetic benchmarks predict performance for real programs
Why not?
Why not?
The switch to multiprocessors happened due to ILP wall and

Power wall, not due to dramatically simplified parallel
programming
In the multi-core era, improving performance is now the
burden of programmers
Programmers must make their programs more and more
parallel, an uphill task
Synthetic benchmarks may not take into account effects of real

world systems (loading, context switching)
System may not fare as well in practice as it does on the
benchmark
Synthetic benchmarks may under-reward performanceenhancing optimizations

Whetstone loops with few iterations
System which optimizes loop branch prediction wont fare as

well on the benchmark as in practice
Fallacy
MIPS (Millions of Instructions per Second) is an accurate
measure for comparing performance among computers
=

=

106
106
Problems:
Whats an instruction? Depends upon ISA
One instruction on an ISA may do as much work as ten
instructions on another ISA
MIPS can vary inversely with performance
Pitfall
Comparing hand-coded assembly and compiler-generated,
high-level language performance
Potential issues:
Hand-coded assembly requires specialized programmers; less
likely to be used except in embedded systems
Unless the compiler can perform the same optimizations that
can be done by assembly language programmer, performance
of the compiler-generated code will not match the hand-coded
program
HW floating point instructions vs. software routines

HW faster but executes fewer instructions
MIPS can vary among programs on same computer
Pitfall
Falling prey to Amdahls Law
Dont forget to assess the potential usage/impact of a feature before
embarking on the long journey to implement it
Pitfall
A single point of failure
Dependability is no stronger than the weakest link in the chain
Make every component redundant so that no single component
failure could bring down the whole system
Instruction Set Principles

Reading:
Hennessy and Patterson, Appendix A
RISC paper (Patterson & Sequin): posted on course website
Instruction Set Architecture

Instruction Set Architecture (ISA)
Traditional meaning of computer architecture
What is visible to the programmer/compiler writer
Independent of organization and implementation
E.g., ISA doesnt include caches and pipelines
Instructions, Operands, Addressing Modes
Instruction Set Architecture

Compiler
Input: high level language

Output: assembly language for target ISA
Global, local optimizations
Register allocation
Assembler
Input: Assembly language
Output: Machine code (object file)
Linker
Inputs: Object files, library files
Outputs: Executable program
Loader
Reads executable from disk
Passes command line arguments
Optionally fixes absolute addresses
ISA Classification
ISA Classification
ISA Examples
Stack
HP calculator
Pentium FP (x87 co-processor)
8 registers organized as stack
Accumulator
PDP-8
8051 microcontroller
Load/Store (Register/Register)
RISC: MIPS, Alpha, ARM, PowerPC, SPARC
Itanium
C=A+B
A, B and C are
memory locations.
R1, R2 and R3 are
registers
Register/Memory
IA-32 (Intel x86), Motorola 68000, IBM 360
PDP-11
VAX (really Memory/Memory)

Ece586 Lec5 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ece586 Lec5 1

Uploaded by

Copyright:

Available Formats

Lecture Topics

Quantitative Principles of Computer Design

Principle #1: Exploit Parallelism

Key Principles of Computer Architecture

Take Advantage of Parallelism

Principle #2: Exploit Locality

Principle # 3: Focus on Common Case

Recently accessed items likely to be accessed in the near future

Speeding up 90% of the execution by only 10% is as good as

Most loop branches are taken

Sequential instruction execution

Fallacies and Pitfalls

A falsehood often widely believed to be true

Problems with the above argument:

A processor may be tuned to one particular benchmark suite,

Benchmarks remain valid indefinitely

Peak performance tracks observed performance

Problems with the above argument:

Vulnerability to Benchmark engineering

Peak performance is only useful as an upper bound on the

Example: matrix300 kernel

Multiprocessors are a silver bullet

Synthetic benchmarks predict performance for real programs

The switch to multiprocessors happened due to ILP wall and

Synthetic benchmarks may not take into account effects of real

Synthetic benchmarks may under-reward performanceenhancing optimizations

System which optimizes loop branch prediction wont fare as

MIPS can vary inversely with performance

HW floating point instructions vs. software routines

MIPS can vary among programs on same computer

Instruction Set Principles

Instruction Set Architecture

Instructions, Operands, Addressing Modes

Instruction Set Architecture

Input: high level language

You might also like