You are on page 1of 16

Registers

• Used for storing data


Memory Design I • Structure
ƒ N-bit wide
ƒ Parallel/serial read/write
ƒ Clocked
Kia Bazargan 32 bits
ƒ Static/dynamic implementation
• Register files

16 wo
University of Minnesota
Dept. of ECE ƒ Multiple read/write ports ...

ords
possible
kia@umn.edu ƒ Example: 32-bit wide by 16-bit
deep, dual-port parallel read, 32
Slides adapted from Prof. Chris H. Kim single port parallel write
register file [©Hauck]
EE 5324 - VLSI Design II - © Kia Bazargan 2

Implementing Registers Using Logic Gates Implementing Registers in CMOS


• Flip-flops • Direct gate implementation too costly
ƒ A master-slave JK flip-flop uses 38 CMOS transistors
ƒ Simple SR latch:
• Directlyy implement
p in transistors
S S R Q Q’ ƒ Example: clocked SR FF
Q
S Q 1 1 Q Q’
1 0 0 1 Q
R Q 0 1 1 0 Q
Q Q
R 0 0 x x
φ φ
ƒ Flip-flops
o JK, D, T Note: carefully size the
S R
o Clocked S, R and φ transistors
o Master-slave (edge-triggered) so that we can write
[Rab96] p.342
EE 5324 - VLSI Design II - © Kia Bazargan 3 EE 5324 - VLSI Design II - © Kia Bazargan 4

1
Implementing Registers in CMOS (cont.) Shift Registers: Idea
• Another example: D latch (register) • Shift registers are used for iteratively shifting data
ƒ Uses transmission gate ƒ Used in pipelining, bit-by-bit processing, etc.
ƒ When “WR” asserted, “write” operation will take place
ƒ Stack D latch structures to get n-bit register φ φ φ
D1 D1 D2 D2 D3 D3
D
φ φ φ
WR
Q Q • Problem? D1 D2 D3
D
WR When clock goes high, the data will traverse all
WR the shift registers chain in one clock cycle!
Solution: use non overlapping clocks φ1 and φ2.
WR
φ1 used by odd gates, φ2 by even gates (use
xmission gates after D1’, D2’, D3’).
EE 5324 - VLSI Design II - © Kia Bazargan 5 EE 5324 - VLSI Design II - © Kia Bazargan 6

Memory Architecture: the Big Picture Memory Access Timing: the Big Picture
• Address: which one of the M words to access • Timing:
• Data: the N bits of the word are read/written ƒ Send address on the address lines,
wait for the word line to become stable
Address ƒ Read/write data on the data lines
S0 Word 0
S1 Word 1 decoder Read Cycle
S2
Storage S0
cells S1 Word 0
READ
... Word 1
... A0 S2
Decode

Read Access Read Access Write Cycle


A1
SM-2 Word M-2 ... ...
Word M-1
... WRITE
SM-1
M 1 Ak-1 SM-2
er

Write Access
N bits SM-1 Word M-2
Word M-1
word Data Valid Data Written
select k = log2 (M) N bits
lines DATA
[©Prentice Hall]
EE 5324 - VLSI Design II - © Kia Bazargan 7 EE 5324 - VLSI Design II - © Kia Bazargan 8

2
Memory Cell Array Interface: Example Memory Cell Array Layout
• Memory parameters: 0 S0 Word 0 • Memory performance (speed)
S0
Word 0
0 S1 Word 1
S1
Word 1
¾ 16-bit wide 0 S2 Word 2 ƒ Storage cell speed (read, write) S2
Word 2
¾ 1024-word deep 0 A0 0 ... ƒ Data bus capacitance
1 A1 1 S9 A0
• Accessing word 9 0 ƒ Periphery: address decoders,

Decoder

Decoder
A2 Word 9
A1
¾ Address = 00000010012 1 A3 sense amplifiers, buffers
0 ...
0 ... ...
… ... . • Memory area ...
. Ak-1
ƒ Cell array layout
0 A9 0 .
S1022 • How to layout the cells array? SM-2
S1023 Word 1022 SM-1 Word M-2
M 2

0 Word 1023 ƒ Linear is bad: Word M-1


16 bits o Long data busses Î large capacity N bits

SenseAmp / o A lot of cells connected to data bus SenseAmp /


Drivers o Decoder will have a lot of logic Drivers
levels
16 bits N bits
EE 5324 - VLSI Design II - © Kia Bazargan 9 EE 5324 - VLSI Design II - © Kia Bazargan 10

Memory Cell Array Layout (cont.) Memory Cell Array Access Example
• Group the M words into M/L rows, each containing L words • word=16-bit wide(N), row=8 words(L), address=10 bits (k)
• Benefits? • Accessing word 9= 00000010012
S0..L-1 L=8 words
Word 0 Word 1 ... Word L-1 S0..7
SL..2L-1
L 2L-1
W dL W
Word WorddLL+1
1 ... W d 2L
Word 2L-1
1 address
address: S8..15 Word 0 Word 1 ... Word 7
Row Decoder
R

Alog L S2L..3L-1 Word 2L Word 2L+1

Row Decoder
... Word 3L-1 Word 8 Word 9 ... Word 15
Alog L+1 1 A3 S16..23
L bits 0 A4
Word 16 Word 17 ... Word 23
... 0 M/L =
... ... ... ... … ... ... ... ... ... 1024/8=
Ak-1 0 A9 128 rows
SM-L..M-1 Word M-L k-L bits S1016-1023
... ... Word M-1 Word 1016 ... ... Word 1023
N bits N bits ... N bits 16 bits 16 bits ... 16 bits
SAmp/Drv SAmp/Drv ... SAmp/Drv SAmp/Drv SAmp/Drv ... SAmp/Drv
N bits N bits ... N bits 16 bits 16 bits ... 16 bits
A0 1 A0
... Column Decoder + MUX 0 A1 Column Decoder + MUX
Alog L-1 0 A2
N bits 16 bits
EE 5324 - VLSI Design II - © Kia Bazargan 11 EE 5324 - VLSI Design II - © Kia Bazargan 12

3
Hierarchical Memory Structure Decreasing Word Line Delay
• Taking the idea one step further • Word line delay comes into play!
ƒ Shorter wires within each block ƒ We used to have long busses, made 2D array Î
ƒ Enable only one block addr decoderÎ power savings shorter busses
ƒ But, longer word lines!
Row • How to decrease the delay on the word lines?
Address
ƒ Break the word line by inserting buffers
Column
ƒ Place the decoder in the middle
Address
Blk EN Polysilicon word line Polysilicon word line
Block Blk EN
Address Blk EN
Blk EN
Metal word line
Global Bus
Metal bypass

SAmp/ Global drivers/ (a) Drive the word line (b) Use metal bypass
Drv sense amplifiers
[Rab96] p. 558
from both sides [©Prentice Hall]
EE 5324 - VLSI Design II - © Kia Bazargan 13 EE 5324 - VLSI Design II - © Kia Bazargan 14

Decreasing Word Line Delay (cont.) Array-Structured Memory Architecture


• Place the decoder in the middle
• Add buffers to outputs of decoder

d
e
memory c memory
cell array o cell array
d
e
r

Address lines

[©Hauck]
EE 5324 - VLSI Design II - © Kia Bazargan 15 16

4
Semiconductor Memory Read-Write Memories (RWM)
Classification • Basic storage elements of semiconductor
Non-Volatile
memory
R d W i Memory
Read-Write M Read-Write R d O l Memory
Read-Only M
Memory RAM
Random Non-Random EPROM Mask-Programmed
Access Access
E2PROM
SRAM DRAM
Programmable (PROM)

SRAM FIFO FLASH

DRAM LIFO SRAM cell


SRAM: ll has
h gain,
i 6T,
6T FAST
FAST, LOW POWER,
POWER logic
l i
Shift Register compatible, differential
CAM DRAM: cell has no gain, 1T, refresh, slow, DRAM process,
single ended, DENSE

17 18

Memory Scaling Trend Memory Scaling Trend


• Long retention time
Æ low Ioff
– High Vt is required
• Fast access time
Æ high Ion
– High Vgs-Vt is
required

Itoh, IBM R&D, 2003


• Vdd cannot be
scaled down
aggressively for low
• High density is the primary design goal for memories
power consumption
• Low voltage operation is essential for low power
Itoh, IBM R&D, 2003

19 20

5
Why SRAMs are Important Why SRAMs are Important
Cache 1.4 NMOS
Core
0.18μm PMOS

ed ION
Logic 1.2

1 1
σV ∝

Normalize
Cache t
= 1.0 2X

0.13μm
Area WL 0.8
Core
Logic 0.6 100X
Taur, Ning
150nm, 110°C
0.4
Cache 0.01 0.1 1 10 100

0.09μm Normalized IOFF


Core
Logic • Area is the number one concern Æ minimum sized devices
• Memories have better power efficiency compared to logic • Smaller devices have larger variation
• ~9.9B out of 10B transistors will be used for SRAMs • Delay variation, stability, leakage is a problem
• Company with better SRAM design will dominate • Central limit theorem doesn’t hold (σ/μ)

21 22

Positive Feedback: Bi-Stability Meta-Stability


V i1 V o1 = V i 2 V o2

A A

Vi2 = Vo1
Vi2 = Vo1
V o1 Vi2
V o2 = V i 1

C C

V i1 V o2

A
B B
V i 2 = V o1
Vi1 = Vo2 Vi1 = Vo2
C δ δ

Gain should be larger than 1 in the transition region


B
V i 1 = V o2

23 24

6
SRAM Memory Cell SRAM Read Operation
WL WL
‘1’

0 0
‘1’ ‘1’

‘1’ ‘1’
BL BLB BL BLB
• NMOS access transistors • Both bit lines are precharged to Vdd
• Read and write uses the same port: need sufficient • Wordline is fired for one of the cells on bit line
margins • Cell pulls down either BL or BLB
• One wordline to access cell • Sense amp regenerates the differential signal
• Two bit lines (BL, BLB) to carry the data • Data should not flip after read access
• Almost minimum size transistors for small cell area • Driver TR must be stronger than access TR
25 26

SRAM Read Operation SRAM Read Operation


WL

Cbitline ΔVbitline BL
bitli delay
bitline d l = 50 V
50mV
I cell
BLB

SA
out
• Cbitline is large due to large number of cells attached
Murmann class notes • Icell is small due to high density cells
• ∆Vbitline has to be minimized for high speed
• For high density, large number of cells share bitline and – < 100mV bitline voltage difference generated by SRAM cell
wordline – Let the sense amplifier finish the job
– Subarray organization for 32Kb: 128 WL’s, 256BL’s – Increased noise sensitivity, circuit complexity

27 28

7
SRAM Read Operation: Precharge SRAM Read Operation: Precharge
• Option (a)
– Similar to dynamic logic precharge
– Balance transistor to equalize bitline voltages
– Short
Sh wordline
dli pulse
l required
i d to li
limit
i bitline
bi li swing
i
• Option (b)
– Pseudo-NMOS type circuit
– Bitline voltage clamped during read
• Option (c)
– NMOS pullup instead of PMOS
– Precharge levels are limited to Vdd-V
Vt
– Can’t operate at low Vdd
Vdd
Vdd-Vtn

29 30

1
SRAM Cell Read Margin Static Noise Margin 0.8
Vdd VDD

V(QB)
0.6
Vdd VDD
Vdd VDD 0.4

0.2

Vdd Good SNM


0
0ÆVx VQ 0 0.2 0.4 0.6 0.8 1
VQB V(Q)
1
E. Seevinck,
• When cell is not accessed (WL=0) 0.8 1987, JSSC

– Data is safely kept inside the cell

V((QB)
0.6
– High
Hi h noise
i margin i
z Destructive read problem 0.4
• When cell is accessed (WL=Vdd)
z The size of the largest square 0.2
– Access transistor acts as a noise source Bad SNM
enclosed in the butterfly curves 0
– Data ‘0’ is pulled up to Vx
= read static noise margin 0 0.2 0.4 0.6 0.8 1
– Cell data can flip if Vx rises above Vtn V(Q)

31 32

8
CMOS SRAM Analysis (Read) Techniques to Improve Read Margin
WL

V DD
BL M4
BL
Cell beta ratio =
Q= 0
Q= 1 M6 (W/L)drv / (W/L)access
M5

V DD M1 V DD V DD

Cbit Cbit
J. Rabaey

• Increasing the size of the driver NMOS improves read


margin
• But remember, area is the number one constraint in
memory design
• Increasing cell size a not a good trade off

33 34

Techniques to Improve Read Margin Techniques to Improve Read Margin


• High Vt transistors • Boosted cell supply Vdd+∆
– Internal node on low – Supply voltage of
side needs to rise to Vt Vdd
or more
SRAM cell ll iis hi
higher
h
than outside Vdd
– Virtually never Vdd
happens when Vt is – Makes driver stronger
larger than half Vdd than access,
– Cell is extremely suppressing the rise in Vdd
stable at ultra-low the low side 0
power design point SNM low Vt
– Beta ratio constraint is – Effectively improves
relaxed Æ smaller
SNM high Vt the beta ratio
driver and larger – Driver NMOS can be
access TR can be downsized, decreasing
used for faster read
and write cell size

35 36

9
SRAM Write Operation CMOS SRAM Analysis (Write)
WL WL
‘1’
(W / L)4
V DD

PR=
(W /L
/ L)6
M4
0
‘1’ Q= 0 M6
M5
Q= 1

‘1’ ‘0’ M1
V DD
BL BLB BL = 1 BL = 0

• Launch the write data on BL and BLB


• Word line signal is fired
• Low bit line value flips cell data
• Access TR must be stronger than PMOS load

37 38

SRAM Cell Write Margin Techniques to Improve Write Margin


J. Rabaey
Vdd • Sizing: access TR vs. PMOS in latch
Vdd
0 • Higher WL voltage for access TR
• Virtual VDD
VddÆ0
0ÆVdd

Higher voltage
= (W/L)pmos / (W/L)access Sizing

• Access transistor must be stronger than PMOS to pull the


below the trip point (typical pull-up ratio ~ 1.5)
• To avoid cell size increase, correct pull-up ratio achieved
by controlling Vtn and Vtp

39 40

10
6T-SRAM Layout Until 90nm 6T-SRAM Layout From 65nm
BL BLB

VDD

GND

WL

Compact cell
Bitlines: M2
Wordline: strapped in M3
41 42

6T versus 4T SRAM RAM Variations


• Many variations to the basic 6T SRAM cell
6T SRAM Cell
Supply current is limited
• More functionality, smaller cells
t the
to th leakage
l k currentt off – Dual read or single write cell
transistors in the stable – True multi-ported cell
state
– Content addressable memory (CAM)
– 4T memory cell
– 3T memory cell
4T SRAM Cell – 2T memory cell
Hi h degree
High d off – 1T DRAM cell
compactness
High power consumption

43 44

11
Dual Read or Single Write Cell Multi Port Cell
WL0
WL1 WL0

BL BLB WL1
• Two wordlines, one for each access transistor BL1 BL0 BLB0 BLB1

• Small
S ll increase
i in
i cell
ll size
i • E
Each
h portt has
h separate t address
dd
• Can either • Memory access bandwidth is twice (ideally)
– read two different cells in one cycle • “Write through”: data written can be read by
– or write to one cell another port in the very same cycle
45 46

Multi-Port RAM Cells Array Content Addressable Memory (CAM)


• 7 words deep, • Instead of address, provide data Î find a match
SA0
2 wide words, ƒ Applications: cache, physical particle collider
dual port mem
• Needs “Encoder”:
• To read from SB0
ƒ Inverse function of decoder
word j and write SA1
“d1d0” to word k ƒ Take a one-hot collection of signals and encode them
simultaneously: m bits
ƒ Set SAj=1, and all SB1
other SA’s=0 ... ... e
ƒ Set SBk=1,
1, and all SA7 n
content c
other SB’s=0 2n rows addressable o
ƒ Sense the values on memory d
bus_A0 and bus_A1 SB7 cell array e
r
bus_A0

bus_A1
bus_B0

bus_B0

bus_B1

bus_B1
bus_A0

bus_A1

ƒ Write d1d0 to
bus_B0 and bus_B1 m n [©Hauck]
EE 5324 - VLSI Design II - © Kia Bazargan 47 EE 5324 - VLSI Design II - © Kia Bazargan 48

12
Content Addressable Memory Cell Encoders
• Read and write like normal 6T memory cell e
content n
• Match signal is precharged to 1, addressable c
memory o
pulled to 0 if no match
p d
cell
ll array e
ƒ Send data on bit’ and data’ on bit for matching
r
ƒ Match remains 1 iff all bits in word match
row row
select select

match match

bit bit' bit bit'


[©Hauck]
EE 5324 - VLSI Design II - © Kia Bazargan 49 EE 5324 - VLSI Design II - © Kia Bazargan 50

Smaller RAM Cells 1-T DRAM Cell


BL
WL Write 1 Read 1
WL

M1
X GND V DD 2- V T
CS
V DD
BL
• Internal nodes don’t go to V DD /2 V /2
sensing DD
• Need 2 wordlines, read WL
Vdd
and write WL CBL
• Cell won’t work at low Vdd
• Can have 1 or 2 bitlines
• High value stored is (Read/Write)
degraded Write: C S is charged or discharged by asserting WL and BL.
• Not very small, since it has Read: Charge redistribution takes places between bit line and storage capacitance
• Effective strength of NMOS more wires CS
driver is reduced ΔV = VBL – VPRE = VBIT – VPRE ------------
CS + CBL
• Refresh needed Voltage swing is small; typically around 250 mV.

51 52

13
DRAM Cell Observations Sense Amp Operation
‰ 1T DRAM requires a sense amplifier for each bit line,
due to charge redistribution read-out.
V BL V(1)
‰ DRAM memory cells are single ended in contrast to
SRAM cells.
‰ The read-out of the 1T DRAM cell is destructive; read V PRE
and refresh operations are necessary for correct D V(1)
operation.
‰ When writing a “1” into a DRAM cell, a threshold
voltage is lost.
lost This charge loss can be circumvented by V(0)
bootstrapping the word lines to a higher value than VDD Sense amp activated t
Word line activated

53 54

Dynamic RAM 1-Transistor Cell: Layout


1-T DRAM Cell
Capacitor

M 1 word
line
Metal word line
SiO2
Poly
n+ n+ Field Oxide Diffused
bit line
Inversion layer
Poly Polysilicon
induced by Polysilicon
plate bias gate plate

Cross-section Layout

Uses Polysilicon-Diffusion Capacitance


Expensive in Area
[©Prentice Hall]
55 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 56

14
Advanced 1-T DRAM Cells Good References on RAM
Word line
Cell plate Capacitor dielectric layer
• K. Itoh, VLSI Memory Chip Design, Springer-Verlag
Insulating Layer
New York, LLC
• Y.
Y Nakagome
Nakagome, M M. Horiguchi
Horiguchi, TT. Kawahara
Kawahara, and K
K. Itoh
Itoh,
Cell Plate Si Review and future prospects of low-voltage RAM
circuits, Vol. 47, No. 5/6, 2003, IBM J R&D
Transfer gate Isolation
Capacitor Insulator
Refilling Poly
Storage electrode • R. W. Mann, W. W. Abadeer, M. J. Breitwisch, O. Bula,
et al, Ultralow-power SRAM technology, Vol. 47, No. 5/6,
Storage Node Poly
Si Substrate 2003, IBM J R&D
2nd Field Oxide

Trench Cell Stacked-capacitor Cell

57 58

59 60

15
61 62

63 64

16

You might also like