You are on page 1of 21

CSE 420:

Computer Architecture
Reiley Jeyapaul
Reading References:
1) COAQA, Hennessy & Patterson (5th Edition) : Ch 2,
Appendix B
2) COAD, Patterson & Hennessy (3rd Edition) : Ch 7

M
C L

Announcements

Exam 2 Solutions posted on BB

Project 3 is posted on BB

Deadline : 10/24/2014 11:59 PM


Project work involves:
significant number (30+) of simulation runs on matmul code, and
benchmarks.
C++ coding on the gem5 simulator code.

Will have to rebuild the gem5 simulator

Exam 3 : Oct 28, 2014 (in class)

Open book, Open Notes, NO Laptops, NO Internet


Allowed Cheatsheet count : 4
Will be quizzed on Module 3 course content
Course slides available on blackboard
The two text books specified in class
Web page: aviral.lab.asu.edu

M
C L

Where are we and


where to?
Assembly Language
Simple Processor Design
Pipelined Processor

Bypasses
Branch Prediction

Memory Hierarchy

Web page: aviral.lab.asu.edu

1/28/15

M
C L

CPU-Memory Bottleneck
CPU

Memory

Performance of high-speed computers is usually


limited by memory bandwidth & latency
Latency (time for a single access)
Memory access time >> Processor cycle time

Bandwidth (number of accesses per unit time)


if fraction m of instructions access memory,
1+m memory references / instruction
CPI = 1 requires 1+m memory refs / cycle
(assuming MIPS RISC ISA)
4

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Core Memory

Frederick Viehe claimed the first patent of the magnetic


core memory in 1947
An Wang and Way-Dong Woo at Harvard also did
substantial work, but Harvard was not very interested in
technology patents.
Jay Forrester's group at MIT, became aware of this work,
this for realtime flight
1/28/15simulator
5 and developed
Web page: aviral.lab.asu.edu

M
C L

Core Memory

Core relies on the hysteresis of the magnetic material used to


make the rings.
Wires that pass through the cores create magnetic fields. Only a
magnetic field greater than a certain intensity ("select") can
cause the core to change its magnetic polarity.
To select a memory location, one of the X and one of the Y lines
are driven with half the current ("half-select") required to cause
this change.

Only the combined magnetic field generated where the X and


Y lines cross is sufficient to change the state

By driving the current through the wires in a particular


direction, the resulting induced field forces the selected core's
magnetic flux to circulate in one direction or the other
(clockwise
or counterclockwise).
http://www.magnet.fsu.edu/education/tutorials/java/magneticcorememory/
index.html

One direction is a stored 1, while the other is a stored


0.
6

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Core Memory
Robust, non-volatile storage
Used on space shuttle computers until recently
Cores threaded onto wires by hand (25 billion
a year at peak production)
Core access time ~ 1s

DEC PDP-8/E Board,


4K words x 12 bits, (1968)

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Semiconductor Memory, DRAM

Semiconductor memory began to be


competitive in early 1970s

First commercial DRAM was Intel 1103

Intel formed to exploit market for semiconductor


memory

1Kbit of storage on single chip


charge on a capacitor used to hold value

Semiconductor memory quickly replaced


core in 70s
8

Web page: aviral.lab.asu.edu

1/28/15

M
C L

One Transistor Dynamic RAM


1-T DRAM Cell
word
access transistor
TiN top electrode (VREF)

VREF

Ta2O5 dielectric

bit
Storage
capacitor (FET gate,
trench, stack)

poly
word
line

W bottom
electrode
access
transistor

Web page: aviral.lab.asu.edu

1/28/15

M
C L

DRAM Architecture
bit lines
Col.
2M

Col.
1

N+M

Row 1

Row Address
Decoder

word lines

Row 2N

Column Decoder &


Sense Amplifiers
Data

Memory cell
(one bit)

Bits stored in 2-dimensional arrays on chip


Modern chips have around 4 logical banks on each chip

each logical bank physically implemented as many smaller arrays

10

Web page: aviral.lab.asu.edu

1/28/15

M
C L

DRAM Operation
Three steps in read/write access to a given
bank
Row access (RAS)

Column access (CAS)

decode row address, enable addressed row


(often multiple Kb in row)
bitlines share charge with storage cell
small change in voltage detected by sense
amplifiers which latch whole row of bits
sense amplifiers drive bitlines full rail to
recharge storage cells
decode column address to select small number
of sense amplifier latches (4, 8, 16, or 32 bits
depending on DRAM package)
on read, send latched bits out to chip pins
on write, change sense amplifier latches which
then charge storage cells to required value
can perform multiple column accesses on same
row without another row access (burst mode)

Precharge
charges bit lines to known value, required
11before nextWeb
row page:
access
aviral.lab.asu.edu

1/28/15

M
C L

DRAM Features

Structural simplicity

Volatile

Very high density


Looses data when power supply is
removed
Requires continuous recharging

Recharge Logic (now mostly on-chip)

Refresh every row in a tight loop once every 64ms


Periodically issue refresh cmd for each row

a system with 213 = 8192, refresh next row after 64/8192 = 7.8
s.

Refresh rate is very sensitive on temperature

12

Data will be saved for weeks, if chip kept in liquid nitrogen


Recovery has been demonstrated even after a few minutes of power
loss.
Web page: aviral.lab.asu.edu

1/28/15

M
C L

DRAM Packaging

DIMM (Dual Inline Memory Module) contains


multiple chips with clock/control/address signals
connected in parallel (sometimes need buffers
to drive signals to all chips)

Data pins work together to return wide word


(e.g., 64-bit data bus using 16x4-bit parts)

DDR 2

DD
R

DDR 4
Web page: aviral.lab.asu.edu

1/28/15
13

M
C L

Double-Data Rate (DDR2) DRAM


200MHz
Clock

Row

Column Precharge

Row

Data
400Mb/s
Data
Rate

[ Micron, 256Mb DDR2 SDRAM


datasheet ]
14

Web page: aviral.lab.asu.edu

1/28/15

M
C L

DRAM Generations

Synchronous DRAM

One word per clock cycle

Bus speed = Bus speed

Double Data Rate SDRAM (DDR1)

Two words per clock cycle

Bus speed = Bus speed

DDR2 SDRAM

Two words per clock cycle

Bus speed = 2 * Bus speed

DDR3 SDRAM

Two words per clock cycle

Bus speed = 4 * Bus speed

chip capacities of 512 megabits to 8 gigabits => module capacity 16 GB

DDR4 SDRAM

Two words per clock cycle

Bus speed = 8 * Bus speed


15

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Static RAM

Each bit in an SRAM is stored on four


transistors that form two cross-coupled
inverters.
This storage cell has two stable states
which are used to denote 0 and 1.

Two additional access transistors serve


to control the access to a storage cell
during read and write operations.

6T, 8T, and 10T SRAMs

Fast, no refresh

Standby: When the word line is not


asserted M5 and M6 disconnect the
cell
Writing:
Reading:
Apply value to be written to
Assert both bitlines
bitlines
Assert WL (open M5 and M6)
E.g. 0 to BL and 1 to ~BL
One bitline will discharge
Set the two NOT gates

16

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Flip Flops

Fastest form of
memory

Store data using


combinational logic
components only

SR, JK, T, D- flip flops

17

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Flash Memory

Flash becoming attractive


secondary storage

Non-volatile
Robust - No mechanical parts
Extremely good power and performance
characteristics
Towards replacing hard disks

Challenges:

Unpredictable and long write times


Lifetime can write to a location only
6 times
10
http://www.howstuffworks.com/flash-memory.htm

18

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Cold boot Attack

Steal memory from computer A, and insert on computer B.

Boot a light-weight OS on machine B, e.g. through USB


flash drive.
Dump the contents of pre-boot memory to a file.
Memory analysis can be done on the dump to find sensitive
keys

Cold boot Attack has been demonstrated to be effective


against full disk encryption
Time window for attack can be extended by cooling the
memory modules

CSE 420: Computer Architecture I


19
Web page: aviral.lab.asu.edu

1/28/15

M
C L

Size-Speed-Cost-Tradeof
cache

CPU
CPU
regs
regs

register
reference
size:
608 B
speed: 1.4 ns
$/Mbyte:
block size: 4 B

C
a
c
h
e

virtual memory

16 B

C
a
c
h
e

8B

4 KB
Memory
Memory

disk
disk

L1-cache
reference

L2-cache
reference

memory
reference

disk memory
reference

128kB
4.2 ns

512kB -- 4MB
16.8 ns
$90/MB
16 B

128 MB
112 ns
$2-6/MB
4-8 KB

27GB
9 ms
$0.01/MB

4B

larger, slower, cheaper


(Numbers are for a 21264 at 700MHz)
20

Web page: aviral.lab.asu.edu

1/28/15

M
C L

Hitchhikers guide to the galaxy


The chances of finding
out what's really going
on in the universe are
so remote, the only
thing to do is hang the
sense of it and keep
yourself occupied. Look
at me, I design fjords.
I'd far rather be happy
than right any day
21

Web page: aviral.lab.asu.edu

1/28/15

M
C L

You might also like