You are on page 1of 78

8-

CHAPTER

Array Subsystems

VLSI Design

Chih-Cheng Hsieh

Outline
1.
2.
3.
4.
5.
6.

VLSI Design

8- 2

SRAM
DRAM
Read-Only Memory (ROM)
Serial Access Memory
Content-Addressable Memory
Programmable Logic Array

Chih-Cheng Hsieh

Memory Arrays

8- 3

Memory Arrays

Random Access Memory

Read/Write Memory
(RAM)
(Volatile)

Static RAM
(SRAM)

Dynamic RAM
(DRAM)

Mask ROM

Programmable
ROM
(PROM)

VLSI Design

Content Addressable Memory


(CAM)

Serial Access Memory

Read Only Memory


(ROM)
(Nonvolatile)

Shift Registers

Serial In
Parallel Out
(SIPO)

Erasable
Programmable
ROM
(EPROM)

Queues

Parallel In
Serial Out
(PISO)

Electrically
Erasable
Programmable
ROM
(EEPROM)

First In
First Out
(FIFO)

Last In
First Out
(LIFO)

Flash ROM

Chih-Cheng Hsieh

Array Architecture

8- 4

2n words of 2m bits each


If n >> m, fold by 2k into fewer rows of more columns
wordline
s

bitline conditioning
bitlines

row decoder

memory cells:
2n-k rows x
2m+k columns

n-k

column
circuitry

k
n

column
decoder

2m bits

Good regularity easy to design


Very high density if good cells are used
VLSI Design

Chih-Cheng Hsieh

SRAM Architecture

8- 5

SRAM Cell
Decoders
Column Circuitry
Multiple Ports

VLSI Design

Chih-Cheng Hsieh

12T SRAM Cell

8- 6

Basic building block: SRAM Cell


Holds one bit of information, like a latch
Must be read and written

12-transistor (12T) SRAM cell


Use a simple latch connected to bitline
46 x 75 l unit cell
bit
write
write_b
read
read_b

VLSI Design

Chih-Cheng Hsieh

6T SRAM Cell

8- 7

Cell size accounts for most of array size


Reduce cell size at expense of complexity

6T SRAM Cell

Used in most commercial chips


Data stored in cross-coupled inverters

Read:

Precharge bit, bit_b


Raise wordline

bit

bit_b

word

Write:

Drive data onto bit, bit_b


Raise wordline

VLSI Design

Chih-Cheng Hsieh

SRAM Read
Precharge both bitlines high
word
Then turn on wordline
One of the two bitlines will
be pulled down by the cell
Ex: A = 0, A_b = 1
bit discharges, bit_b stays high
But A bumps up slightly

Read stability
A must not flip
N1 >> N2
(N1 is stronger than N2)
VLSI Design

8- 8

bit_b

bit

P1 P2

N2
A

N4
A_b

N1 N3

A_b

bit_b

1.5

1.0

bit

word

0.5

A
0.0
0

100

200

300

400

500

600

time (ps)

Chih-Cheng Hsieh

SRAM Write

8- 9

Drive one bitline high, the other low


bit
Then turn on wordline
word
Bitlines overpower cell with
N2
new value
A
Ex: A = 0, A_b = 1, bit = 1,
bit_b = 0
Force A_b low, then A rises high

Writability
Must overpower feedback
inverter
N4 >> P2

bit_b

P1 P2

N4
A_b

N1 N3

A_b
A

1.5

bit_b
1.0

0.5

word

0.0
0

100

200

300

400

500

600

700

time (ps)

VLSI Design

Chih-Cheng Hsieh

SRAM Column Example


Read

8- 10

Write

Bitline Conditioning

Bitline Conditioning

More
Cells

More
Cells

word_q1

word_q1

bit_v1f

word_q1

out_b_v1r

VLSI Design

bit_v1f

bit_b_v1f

bit_v1f

SRAM Cell

SRAM Cell

bit_b_v1f

out_v1r

write_q1

H
out_v1r

data_s1

Chih-Cheng Hsieh

SRAM Sizing

8- 11

High bitlines must not overpower inverters during


reads
But low bitlines must write new value into cell
bit_b

bit
word
weak
med

med
A

A_b
strong

VLSI Design

Chih-Cheng Hsieh

SRAM Layout

8- 12

Cell size is critical: 26 x 45 l (even smaller in industry)


Tile cells sharing VDD, GND, bitline contacts

GND

BIT BIT_B GND

VDD

WORD

Cell boundary

VLSI Design

Chih-Cheng Hsieh

Decoders

8- 13

n:2n decoder consists of 2n n-input AND gates


One needed for each row of memory
Build AND from NAND or NOR gates

Static CMOS
A1

Pseudo-nMOS
A1

A0

VLSI Design

A1

A0

A0

1/2

word

A0

A1

word0

word0

word1

word1

word2

word2

word3

word3

16

word

Chih-Cheng Hsieh

Decoder Layout

8- 14

Decoders must be pitch-matched to SRAM cell


Requires very skinny gates
A3

A3

A2

A2

A1

A1

A0

A0

VDD

word

GND
NAND gate

VLSI Design

buffer inverter

Chih-Cheng Hsieh

Large Decoders

8- 15

For n > 4, NAND gates become slow


Break large gates into multiple smaller gates
A3

A2

A1

A0

word0

word1

word2

word3

word15

VLSI Design

Chih-Cheng Hsieh

Predecoding

8- 16

Many of these gates are redundant


Factor out common
gates into predecoder
Saves area
Same path effort

A3

A2

A1

A0

predecoders
1 of 4 hot
predecoded lines
word0
word1

word2
word3

word15

VLSI Design

Chih-Cheng Hsieh

Column Circuitry

8- 17

Some circuitry is required for each column


Bitline conditioning
Sense amplifiers
Column multiplexing

VLSI Design

Chih-Cheng Hsieh

Bitline Conditioning

8- 18

Precharge bitlines high before reads

Equalize bitlines to minimize voltage difference


when using sense amplifiers

bit

VLSI Design

bit_b

Chih-Cheng Hsieh

Sense Amplifiers

8- 19

Bitlines have many cells attached


Ex: 32-kbit SRAM has 256 rows x 128 cols
256 cells on each bitline

tpd (C/I) DV

Even with shared diffusion contacts, 128C of diffusion


capacitance (big C)
Discharged slowly through small transistors (small I)

Sense amplifiers are triggered on small voltage


swing (reduce DV)

VLSI Design

Chih-Cheng Hsieh

Differential Pair Amp

8- 20

Differential pair requires no clock


But always dissipates static power

sense_b
bit

P1
N1

P2
N2

sense
bit_b

N3

VLSI Design

Chih-Cheng Hsieh

Clocked Sense Amp

8- 21

Clocked sense amp saves power


Requires sense_clk after enough bitline swing
Isolation transistors cut off large bitline
capacitance
bit

bit_b
isolation
transistors

sense_clk

regenerative
feedback

sense
VLSI Design

sense_b
Chih-Cheng Hsieh

Twisted Bitlines

8- 22

Sense amplifiers also amplify noise


Coupling noise is severe in modern processes
Try to couple equally onto bit and bit_b
Done by twisting bitlines
b0 b0_b b1 b1_b b2 b2_b b3 b3_b

VLSI Design

Chih-Cheng Hsieh

Column Multiplexing

8- 23

Recall that array may be folded for good aspect


ratio
Ex: 2 kword x 16 folded into 256 rows x 128
columns
Must select 16 output bits from the 128 columns
Requires 16 8:1 column multiplexers

VLSI Design

Chih-Cheng Hsieh

Tree Decoder Mux

8- 24

Column mux can use pass transistors


Use nMOS only, precharge outputs

One design is to use k series transistors for 2k:1


mux
No external decoder logic needed
B0 B1

B2 B3

B4 B5

B6 B7

B0 B1

B2 B3

B4 B5

B6 B7

A0
A0
A1
A1
A2
A2
Y

VLSI Design

to sense amps and write circuits

Chih-Cheng Hsieh

Single Pass-Gate Mux

8- 25

Or eliminate series transistors with separate


decoder
A1

A0

B0 B1

B2 B3

VLSI Design

Chih-Cheng Hsieh

Ex: 2-way Muxed SRAM

8- 26

2
More
Cells

More
Cells

word_q1

A0
A0
write0_q1

write1_q1

data_v1

VLSI Design

Chih-Cheng Hsieh

Multiple Ports

8- 27

We have considered single-ported SRAM


One read or one write on each cycle

Multiported SRAM are needed for register files


Examples:
Multicycle MIPS must read two sources or write a
result on some cycles
Pipelined MIPS must read two sources and write a
third result each cycle
Superscalar MIPS must read and write many sources
and results each cycle

VLSI Design

Chih-Cheng Hsieh

Dual-Ported SRAM

8- 28

Simple dual-ported SRAM


Two independent single-ended reads
Or one differential write
bit

bit_b

wordA
wordB

Do two reads and one write by time multiplexing


Read during ph1, write during ph2
VLSI Design

Chih-Cheng Hsieh

Multi-Ported SRAM

8- 29

Adding more access transistors hurts read stability


Multiported SRAM isolates reads from state node
Single-ended design minimizes number of bitlines
bA bB bC

bD bE bF bG

wordA
wordB
wordC
wordD
wordE
wordF
wordG

write
circuits

read
circuits

VLSI Design

Chih-Cheng Hsieh

SRAM Scaling

VLSI Design

8- 30

Chih-Cheng Hsieh

Outline
1.
2.
3.
4.
5.
6.

VLSI Design

8- 31

SRAM
DRAM
Read-Only Memory (ROM)
Serial Access Memory
Content-Addressable Memory
Programmable Logic Array

Chih-Cheng Hsieh

DRAM

DV

VLSI Design

8- 32

Ccell
VDD
2 Ccell Cbit

Chih-Cheng Hsieh

Subarray Architecture

VLSI Design

8- 33

Chih-Cheng Hsieh

Bitline Architectures

8- 34

Bitline capacitance in a subarray is an order of


magnitude higher than that in the cell
Small V and need sense amplifier

Open bitlines: use another subarray as reference


Higher density
Noise affect one array more than the other appears as
differential noise.

Folded bitlines: take the neighbor cell in the same


subarry as reference
Noise appears as common mode
Larger layout area
VLSI Design

Chih-Cheng Hsieh

Bitline Architectures

8- 35

Open bitlines
Folded bitlines

VLSI Design

Chih-Cheng Hsieh

Outline
1.
2.
3.
4.
5.
6.

VLSI Design

8- 36

SRAM
DRAM
Read-Only Memory (ROM)
Serial Access Memory
Content-Addressable Memory
Programmable Logic Array

Chih-Cheng Hsieh

Read-Only Memories

8- 37

Read-Only Memories are nonvolatile


Retain their contents when power is removed

Mask-programmed ROMs use one transistor per


bit
Presence or absence determines 1 or 0

VLSI Design

Chih-Cheng Hsieh

ROM Example

8- 38

4-word x 6-bit ROM


Represented with dot diagram
Dots indicate 1s in ROM
weak
pseudo-nMOS
pullups

A1 A0

Word 0: 010101
Word 1: 011001

Word 2: 100101
Word 3: 101010

2:4
DEC

ROM Array

Y5

Y4

Y3

Y2

Y1

Y0

Looks like 6 4-input pseudo-nMOS NORs


VLSI Design

Chih-Cheng Hsieh

ROM Array Layout

8- 39

Unit cell is 12 x 8 l (about 1/10 size of SRAM)

VLSI Design

Chih-Cheng Hsieh

Row Decoders

8- 40

ROM row decoders must pitch-match with ROM


Only a single track per word!

VLSI Design

Chih-Cheng Hsieh

Complete ROM Layout

VLSI Design

8- 41

Chih-Cheng Hsieh

PROMs and EPROMs

8- 42

Programmable ROMs
Build array with transistors at every site
Burn out fuses to disable unwanted transistors

Electrically Programmable ROMs


Use floating gate to turn off unwanted transistors
EPROM, EEPROM, Flash
Source

Gate

Drain

Polysilicon
Floating Gate
Thin Gate Oxide
(SiO2)

n+

n+
p

VLSI Design

bulk Si

Chih-Cheng Hsieh

Building Logic with ROMs

8- 43

Use ROM as lookup table containing truth table


n inputs, k outputs requires 2n words x k bits
Changing function is easy reprogram ROM

Finite State Machine


n inputs, k outputs, s bits of state
Build with 2n+s x (k+s) bit ROM and (k+s) bit reg
inputs
n

ROM Array

2n wordlines

DEC

inputs
n ROM k
s

outputs
k
s

state
k outputs
VLSI Design

Chih-Cheng Hsieh

Example: RoboAnt

8- 44

Lets build an Ant


Sensors: Antennae
(L,R) 1 when in contact
Actuators: Legs
Forward step F
Ten degree turns TL, TR

Goal: make our ant smart enough to


get out of a maze
Strategy: keep right antenna on wall
(RoboAnt adapted from MIT 6.004 2002 OpenCourseWare by Ward
and Terman)
VLSI Design

Chih-Cheng Hsieh

Lost in space

8- 45

Action: go forward until we hit something


Initial state

VLSI Design

Chih-Cheng Hsieh

Bonk!!!

8- 46

Action: turn left (rotate counterclockwise)


Until we dont touch anymore

VLSI Design

Chih-Cheng Hsieh

A little to the right

8- 47

Action: step forward and turn right a little


Looking for wall

VLSI Design

Chih-Cheng Hsieh

Then a little to the right

8- 48

Action: step and turn left a little, until not


touching

VLSI Design

Chih-Cheng Hsieh

Whoops a corner!

8- 49

Action: step and turn right until hitting next wall

VLSI Design

Chih-Cheng Hsieh

Simplification

8- 50

Merge equivalent states where possible

VLSI Design

Chih-Cheng Hsieh

State Transition Table

Lost

RCCW
Wall1

Wall2

VLSI Design

S1:0
00
00
00
01
01
01
10
10
11
11
11

L
0
1
0
1
0
0
X
X
1
0
0

R
0
X
1
X
1
0
0
1
X
0
1

S1:0
00
01
01
01
01
10
10
11
01
10
11

TR
0
0
0
0
0
0
1
1
0
0
0

TL
0
0
0
1
1
1
0
0
1
1
1

8- 51

F
1
1
1
0
0
0
1
1
1
1
1

Chih-Cheng Hsieh

ROM Implementation
16-word x 5 bit ROM

8- 52

S1 S0 L R

0000

L, R

TL, TR, F

ROM

0001
0010
0100

4:16 DEC

S'1:0
S1:0

0011
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

S1' S0' TR'TL' F'

VLSI Design

Chih-Cheng Hsieh

ROM Implementation
16-word x 5 bit ROM

8- 53

S1 S0 L R

0000

L, R

TL, TR, F

ROM

0001
0010
0100

4:16 DEC

S'1:0
S1:0

0011
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

S1' S0' TR'TL' F'

VLSI Design

Chih-Cheng Hsieh

Outline
1.
2.
3.
4.
5.
6.

VLSI Design

8- 54

SRAM
DRAM
Read-Only Memory (ROM)
Serial Access Memory
Content-Addressable Memory
Programmable Logic Array

Chih-Cheng Hsieh

Serial Access Memories

8- 55

Serial access memories do not use an address


Shift Registers
Tapped Delay Lines
Serial In Parallel Out (SIPO)
Parallel In Serial Out (PISO)
Queues (FIFO, LIFO)

VLSI Design

Chih-Cheng Hsieh

Shift Register

8- 56

Shift registers store and delay data


Simple design: cascade of registers
Watch your hold times!

clk
Din

Dout
8

VLSI Design

Chih-Cheng Hsieh

Denser Shift Registers

8- 57

Flip-flops arent very area-efficient


For large shift registers, keep data in SRAM instead
Move read/write pointers to RAM rather than data
Initialize read address to first entry, write to last
Increment address on each cycle
Din

clk

11...11

reset
VLSI Design

counter

counter

00...00

readaddr
writeaddr

dual-ported
SRAM

Dout
Chih-Cheng Hsieh

Tapped Delay Line

8- 58

A tapped delay line is a shift register with a


programmable number of stages
Set number of stages with delay controls to mux
Ex: 0 63 stages of delay
clk

delay2

SR1

delay3

SR2

VLSI Design

delay4

SR4

delay5

SR8

SR16

SR32

Din

delay1

Dout

delay0

Chih-Cheng Hsieh

Serial In Parallel Out

8- 59

1-bit shift register reads in serial data


After N steps, presents N-bit parallel output

clk
Sin
P0

VLSI Design

P1

P2

P3

Chih-Cheng Hsieh

Parallel In Serial Out

8- 60

Load all N bits in parallel when shift = 0


Then shift one bit out per cycle

P0

P1

P2

P3

shift/load
clk
Sout

VLSI Design

Chih-Cheng Hsieh

Queues

8- 61

Queues allow data to be read and written at


different rates.
Read and write each use their own clock, data
Queue indicates whether it is full or empty
Build with SRAM and read/write counters
(pointers)
WriteClk
WriteData
FULL

VLSI Design

ReadClk
Queue

ReadData
EMPTY

Chih-Cheng Hsieh

FIFO, LIFO Queues

8- 62

First In First Out (FIFO)


Initialize read and write pointers to first element
Queue is EMPTY
On write, increment write pointer
If write almost catches read, Queue is FULL
On read, increment read pointer

Last In First Out (LIFO)


Also called a stack
Use a single stack pointer for read and write
On write, pointer is incremented, on read, pointer is
decremented.
Reach last element = Full, reach 1st element = EMPTY
VLSI Design

Chih-Cheng Hsieh

Outline
1.
2.
3.
4.
5.
6.

VLSI Design

8- 63

SRAM
DRAM
Read-Only Memory (ROM)
Serial Access Memory
Content-Addressable Memory
Programmable Logic Array

Chih-Cheng Hsieh

CAMs

8- 64

Extension of ordinary memory (e.g. SRAM)


Read and write memory as usual
Also match to see which words contain a key
adr

data/key

read
CAM

match

write

VLSI Design

Chih-Cheng Hsieh

10T CAM Cell

8- 65

Add four match transistors to 6T SRAM


56 x 43 l unit cell
bit

bit_b

word

cell_b

cell

match

VLSI Design

Chih-Cheng Hsieh

CAM Cell Operation


Read and write like ordinary SRAM
For matching:

CAM cell
clk

address

weak

miss
match0

row decoder

Leave wordline low


Precharge matchlines
Place key on bitlines
Matchlines evaluate

8- 66

match1
match2
match3

read/write

column circuitry

Miss line

data

Pseudo-nMOS NOR of match lines


Goes high if no words match
VLSI Design

Chih-Cheng Hsieh

Outline
1.
2.
3.
4.
5.
6.

VLSI Design

8- 67

SRAM
DRAM
Read-Only Memory (ROM)
Serial Access Memory
Content-Addressable Memory
Programmable Logic Array

Chih-Cheng Hsieh

PLAs

8- 68

A Programmable Logic Array performs any


function in sum-of-products form.
Literals: inputs & complements
Products / Minterms: AND of literals
Outputs: OR of Minterms
AND Plane

OR Plane

bc

Example: Full Adder

abc
abc

s abc abc abc abc


cout ab bc ac

b
Inputs

VLSI Design

cout

Outputs

Chih-Cheng Hsieh

Minterms

ac
ab
abc
abc

NOR-NOR PLAs

8- 69

ANDs and ORs are not very efficient in CMOS


Dynamic or Pseudo-nMOS NORs are very efficient
Use DeMorgans Law to convert to all NORs
AND Plane

OR Plane

bc

bc

ac
ab
abc
abc

ac

abc
abc

abc

ab
abc

abc
abc

a
s

VLSI Design

AND Plane

OR Plane

cout

c
s

cout

Chih-Cheng Hsieh

PLA Schematic & Layout


AND Plane

8- 70

OR Plane

bc
ac
ab
abc
abc
abc
abc

c
s

VLSI Design

cout
Chih-Cheng Hsieh

PLAs vs. ROMs

8- 71

The OR plane of the PLA is like the ROM array


The AND plane of the PLA is like the ROM decoder
PLAs are more flexible than ROMs
No need to have 2n rows for n inputs
Only generate the minterms that are needed
Take advantage of logic simplification

VLSI Design

Chih-Cheng Hsieh

Example: RoboAnt PLA

8- 72

Convert state transition table to logic equations


S1:0
00
00
00
01
01
01
10
10
11
11
11

L
0
1
0
1
0
0
X
X
1
0
0

R
0
X
1
X
1
0
0
1
X
0
1

S1:0
00
01
01
01
01
10
10
11
01
10
11

TR
0
0
0
0
0
0
1
1
0
0
0

TL
0
0
0
1
1
1
0
0
1
1
1

F
1
1
1
0
0
0
1
1
1
1
1

TR S1 S0
TL S0
F S1 S0

VLSI Design

Chih-Cheng Hsieh

RoboAnt Dot Diagram

8- 73

S1' S1 S0 LS1 LRS0


AND Plane

S 0' R LS1 LS0

OR Plane

TR S1 S0

S0
S1
S0
LS 0

TL S0
F S1 S0

LS1
R
LRS 0
LS1
S1 S 0

S1

S0

S1 ' S0 ' TR
VLSI Design

TL F

Chih-Cheng Hsieh

PLAs vs. ROMs


L, R

TL, TR, F

ROM

8- 74

S1' S1 S0 LS1 LRS0 , S 0' R LS1 LS0


TR S1 S0 , TL S0 , F S1 S0

S'1:0
S1:0
S1 S0 L R

AND Plane

OR Plane

0000
0001

S0
S1
S0
LS 0

0010
0011
0100

4:16 DEC

0101

LS1
R
LRS 0
LS1
S1 S 0

0110
0111
1000
1001
1010
1011
1100
1101

S1

1110

S0

S1 ' S0 ' TR

1111

TL F

S1' S0' TR'TL' F'

VLSI Design

Chih-Cheng Hsieh

Reliability and Yield

8- 75

Semiconductor memories trade-off noise margin for


density and performance
Thus, they are highly sensitive to noise (cross talk, supply noise)

High density and large die size causes yield problems


# of good chips / wafer
Yield=100
# of chips / wafer
Y = [(1 eAD)/(AD)]2

Increase yield using error correction and redundancy


VLSI Design

Chih-Cheng Hsieh

Alpha Particles

8- 76

-particle
WL

VDD

BL
n+

SiO2

_
_
_

1 Particle ~ 1 Million Carriers


VLSI Design

Chih-Cheng Hsieh

Redundancy in the Memory Structure

8- 78

Fuse bank

Redundant row

Redundant columns
Row
address

Column
address

VLSI Design

Chih-Cheng Hsieh

Redundancy and Error Correction

VLSI Design

8- 79

Chih-Cheng Hsieh

You might also like