04 - Chapter 4 - The Hardware Side - Part 4 - Memory

Embedded Systems: A
Contemporary Design Tool

James K. Peckol, Univ. of
Washington
ISBN: 978-0-471-72180-2
Chapter 4 Memories and the
Memory Subsystem
1
E. Sisinni Digital Systems for Signal Processing
Classifying memory
The term memory is generic; there are many different kinds of

memory, each with its strengths and weaknesses.
RAM - Random Access Memory. As the name suggests, any

location in memory is visible for immediate access rather than
having to sequence through predecessor locations. The times for a
read operation and a write operation are comparable. It may be
organized as bits, bytes, or words.
ROM Read Only Memory. During normal operation, RO can only
be read. Like RAM, any location in memory is visible for immediate
access rather than having to sequence through predecessor
locations. The read operation is orders of magnitude faster than a
write operation. Like the RAM, the ROM may be organized as bits,
bytes, or words.
Classifying memory - RAM
DRAM - Dynamic RAM. A simple memory cell design with bit

storage implemented using a stored charge mechanism. The stored
charge can leak away if it is not repeatedly restored. These devices
are used for larger memory systems. I/O is aynchronous with
respect to any external system clocks.
SRAM Static RAM. A more complex memory cell design with bit
storage implemented using a latch-type mechanism. The stored
data does not have to be refreshed. I/O is asynchronous with
respect to any external system clocks.
SDRAM Synchronous DRAM.

SDRAM synchronizes all
addresses, data, and control
signals to the system clock and
allows much higher data transfer
rates than asynchronous transfers.
Classifying memory - ROM
PROM Programmable ROM. A PROM is typically programmed

using a purposely designed device. The memory can be
programmed only one time.
EPROM Like PROM, a programming device is needed. Erasure,

so that it can be reprogrammed, is done by placing the device under
ultraviolet light for a specified time interval.
EEPROM Electrically Erasable PROM. Erasure is done electrically

via the programming device.
FLASH - A kind of EEPROM. It can be reprogrammed in situ.
A general memory interface
Memory = Array
For each index that is

accessed, the
corresponding stored value
appears on the output
(Read access).
Conversely, if one provides
an index and an input
value, the data will be
stored at the
corresponding indexed
location (Write access).
The physical model

requires a bit more
work.
A memory interface
generally requires three
categories of signals:
address, data, and
control.
Address signals are
inputs to the memory,
data can be either an
input or an output, and
the control signals are
generally inputs.
All of the different memory types require both address and data
signals. They differ in the number and the nature of the necessary
control signals.
Memories: terminology
Access Time: The time to access a word in memory

Cycle time: the time interval from the start of one read or write
operation until the start of the next
Memories: terminology
Bandwidth - is a measure of the word

transmission rate to and from memory via the
memory bus ( memory timings).
Latency - the amount of time required to
access the first of a sequence of words.
Block Size - A block is a logical view placed
on a collection of words in memory.
Block Access Time - gives a measure of the
time to access an entire block from the start
of a read.
Page - A page is a logical view placed on
larger collections of words in memory.
Pages are generally comprised of blocks; the
size of a page can be given in words or in
blocks.
Memory architecture
Independent of
memory type, the
typical memory chip
appears as:
The vertical and
horizontal dimensions
are usually very
similar, for an aspect
ratio of unity.
Multiple words are
stored in each row
and selected
simultaneously
A column decoder is
added to select the
desired word from a
row.
10
Memory architecture
11
Larger memories start to suffer excess delay along bit and word
lines. A third dimension is added to the address space to solve this
problem:
ROM overview
12
Transistors are used to connect to ground bit inside a memory word

(a floating connection is read as a logical 1)
ROM Read operation
13
A value is read from a ROM by asserting one of the row lines.
Non-volatile Read-Write Memories
Virtually identical in structure to ROMs.

Selective enabling/disabling of transistors is accomplished through
modifications to threshold voltage. This is accomplished through a
floating gate.
Applying a high voltage (15 to 20 V) between source and gate-drain
create high electric field and causes avalanche injection to occur.
Hot electrons traverse first oxide and get trapped on floating gate,
leaving it negatively charged.
This increases the threshold voltage to ~7V. Applying 5V to the gate
does not permit the device to turn on.
14
SRAM - Overview
15
A high-level interface to the SRAM is very similar to that for the

ROM. There are six transistors per cell (two in each of the buffers
and the two pull-up transistors); two access transistors enable the
cell for read and write.
SRAM Read & Write
Typical timing for a read and a write operation is:

OE_ determines direction.
Hi = Write, Lo = Read
Writes are dangerous! Be careful!
Double signaling: OE_ Hi, RW_ Lo
Write Timing:
CS_
D
Data In
Write Address
Read Timing:
High Z
Data Out
Read Address
Read Address
OE_
RW_
16
Write
Hold Time
Write Setup Time
Read Access
Time
Read Access
Time
DRAM - Overview
17
In the DRAM there is only one transistor per cell!

The read operation destroy the info! The sensed and amplified value
is placed back on to the bit line (a restore or rewrite operation).
Write op: it charges the capacitor if a logical 1 is to be stored and
discharges it if a logical 0 is to be stored.
DRAM Read & Write
CAS or Column
Address Strobe is a
clock used in
dynamic memories
to control the input
of column
addresses to the
memory.
RAS or Row
Address Strobe is a
clock used in
dynamic memories
to control the input
of row addresses to
the memory
18
The memory map
19
The memory map lists the

addresses in memory allocated to
each portion of the application.
Usually ROM hold words that are
not expected to change at
runtime. RAM is the space
available to hold data, among
other things.
If the design is using memory
mapped I/O, then all of physical
memory will not be available for
data or code
Virtual memory makes possible
for the required code and data
space to exceed total available
primary memory
The memory map
20
We need glue-logic for address decoding
0x0000-0x0FFF
0x1000-0x1FFF
0x2000-0x2FFF
0x3000-0xFFFF
4K RAM0
4K RAM1
4K RAM2
Vacant
Full I/O decoding
21
Full I/O decoding involves checking every single line (ie. all bits) of
the address bus (and the I/O R/W signal eventually) to determine if a
device is selected or not. With Full I/O decoding, each hardware
register is mapped to an unique I/O port address.
Full address decoding is very efficient in the use of the available I/O
address space (one I/O address for one hardware register), but is
often impracticable to use because of the excessive hardware
needed to implement it.
Full I/O decoding

2K
2K
8K
1
1
22
Partial I/O decoding
Partial I/O decoding only checks for a few lines (ie. bits) of the
address bus (and the I/O R/W signal eventually) to determine if a
device is selected or not.
There are caveats to such simple decoding:
Ghost addresses
Since not all the address bus lines are decoded, a device can respond
to several differents I/O address but, more importantly, several devices
can respond to the same ghost address (which may lead to bus conflict,
see below).
Bus conflict
This is a short circuit between two,
or more, devices trying to drive the
DATA bus at the same time.
23
Partial I/O decoding
24
An example: SRAM design
25
A system specification requires an SRAM system that can store up

to 4 K 16-bit words. However, the largest memory device available is
1 K (1024) by 8-bit words the design will require eight of the
smaller memory devices: two sets of four.
In the worst case, to support 4 K 16-bit words, 12 address lines and
16 data lines are required. If sufficient lines are available on the uP,
the design is straightforward.
Let's assume that such is not the case and that only 8 address
lines and 8 data lines are available. Under such a restriction, two
address transfers and two data transfers will be necessary to
complete a single transaction.
Ten address bits are needed to identify each cell in the device. Next,
one must be able to identify which of the 1 K blocks to read from or
write to. Two additional address bits enable such a selection to be
made. These four combinations be used to activate the chip select
(CS) control and the output enable (OE)
An example: SRAM design

Data Bus D7-D0
Address Bus D7-D0
R/W
GPIOs
26
An example: WRITE operation
27
Since the uP only supports eight address lines, the full address is
built up in two transfers on the address bus.
Each address and data byte is stored in a register.
Each address/data transfer is accompanied by a strobe signal. After
the data has been stored in the data latches, the write command is
issued.
An example: READ operation
28
To execute a read operation, the desired memory address is

selected, as was done during the write operation. The proper chip
select signal, combined with the state of the read line, begins the
read process on the selected memory block.
For a read operation, one must disable the outputs of the data
latches and enable the memory output drivers.
An example: the whole picture
A multiplexed
implementation is the
more common
architecture: sharing
one set of bus lines
between the two
functions (address
and data).
Under such
circumstances, the
addressand data
registers are
necessary for
temporary storage.
29
Accessing the I/O
External devices are accessed by means of registers

External devices are almost always connected not directly to the
system bus but to an INTERFACE.
Registers in the interface allow for a wide range of possibilities for
the designer to determine how it is to interface to the bus.
Typically consists of three registers
Control Register- the setting of
which will determine if the interface
is to send or receive.
Data Register for the data
element to be transmitted or to hold
a data element received.
Status Register used to obtain
information about the status
(diagnostic) of the I/O device
30
I/O Interface
Control
Register
Data and
Status Registers
Input/Output Device
Memory mapped I/O
I/O Devices and memory share the same address space.

Each I/O Device (Interface) is assigned a unique set of addresses.
When the processor places a particular address on the address lines, the
device recognizing this address responds to the commands on the control
lines.
The processor requests either a read or a write operation, and the
requested data is transferred over the data lines.
Any machine instruction that can access memory can be used to transfer
data to/from I/O devices.
Address Lines
Data Lines
FFFF
Control Lines
I/O
Memory
Address
Decoder
Control
Circuits
Data and
Status Registers
I/O Interface
0000
Peripheral
31
Input/Output Device
Memory mapped Input
32
Memory mapped Output
33
Port Mapped I/O
Memory and I/O

Occupy different spaces
Are accessed by unique instructions
Differentiated by instructions (Memory vs I/O instructions)

I/O instructions
move data to/from a specified I/O address (port) and a CPU register
(e.g., the accumulator)
FFFF
IN port inputs data from a device FFFF
OUT port outputs data to a device
34
Typically, access to memory

and I/O uses the same address
bus and data bus
A dedicated control bus signal
differentiates a memory cycle
from an I/O cycle
Memory
0000
I/O
0000
Memory subsystem architecture
Memory hierarchy: the metric is based on speed and storage

capacity (and cost!).
35
At the top are the slowest, largest, and least

expensive secondary memory.
In the middle is main or primary memory.
At the bottom are the smallest, fastest memories
called cache memory.
CPU registers are sometimes included in the ranking
as higher speed memory than cache.
Caching
36
Cache is a small, fast memory that temporarily holds copies of block

data and program instructions from the main memory.
Harvard architecture, will internally support both an icache
(instruction cache) and a dcache (data cache).
Locality
37
Program execution generally occurs either

sequentially or in small loops with a small
number of instructions. With respect to the
entire program, actual execution takes
place within a small window that moves
forward through the program: sequential
locality of reference.
Spatial locality suggests that a future
access of a resource, a memory address
in this case, is going to be physically near
one previously accessed.
Temporal locality suggests that a future
access of a resource,again, a memory
address, is going to be temporally near
one recently accessed.
Cache systems
Cache memory is organized into several levels.

The application program begins executing and encounters a need
for a piece of data or an instruction. To locate that item, first the
cache is checked.
If the item is found, there is a cache hit.
If the item is not found, there has been a cache miss and the item
must be obtained from somewhere else.
38
How do we know when something is not in the cache?

Where do we go to find something if it is not in the cache?
What if it's not there?
How do we know if there is rood left in the cache?
How do we know if information in the cache was modified?
How do we select the block to replace?
Cache systems Direct mapping
The main memory page size is set equal to the cache size;
therefore, each page will contain a corresponding number of blocks.
Block address 0
Block address 1
Block address 0
Block address 1
Block address 0
Block address 1
Block address 0
Block address 1
Block address 0
Block address 1
39
Cache systems Direct mapping An example

The specifications:
The cache and main memory will store 32-bit words.
The cache size will be 64 K words (128K -> 17bit-addresses).
The cache will be organized as 128 0.5 K word blocks.
The cache will implement a direct mapped replacement algorithm.
Memory addresses will be 32 bits.
Main memory size will be 128 M words.
Main memory will be organized as 2 K pages (128M/64K);
page size = cache size -> each page will hold 128 blocks.
40
Cache systems Direct mapping An example

Address Interpretation in the Cache Context:
Each data or instruction word is 32 bits (4B) long; bits A1 and A0
identify a byte within a word.
Each block contains 512 words. Address bits A10-A2 identify a word
in a block.
The block address within the cache is identified by address bits A17A11. These bits are called the index into cache and also
correspond to the block's address within a main memory page.
Bits A31-A17 identify which main memory page the block came
from. This value is called the tag. These values will be stored in a
data structure called a tag table and are used when testing to see if
the needed word is in a cache.
Use for search in the cache!
41
The TAG table
The tag table contains one record for each block in the cache (i.e.
for the current design128 entries). Typical information contained in
each record includes:
TAG: A subset of bits from the main memory address identifying the
page (in main memory) where the block originated.
VALID BIT: A flag indicating whether the corresponding block contains
valid data (i.e. just memorized). If the valid bit is TRUE, the block must
be checked for changes before overwriting it.
DIRTY BIT: A flag indicating whether the corresponding block contains
data that has been modified. Cache and main memory must be
coherent. The write through approach propagates any data change
immediately to main memory; the delayed write approach assumes that
if a piece of data changed once, it may change again in the near future.
Thus, time can be saved by not performing (potentially) multiple write
operations to the same data.
TIME: when the block was brought into the cache or when it was last
accessed
42
Cache systems Associative mapping
A new block can be placed anywhere in the cache. An associative

search is then executed to locate it. Such an algorithm searches by
content rather than by address.
To find a word in the cache, the tag and block portions of the
memory address specify the target for the associative search.
Block address 0
Block address 1
Use for search in the cache!
Block address ?
Block address ?
43
Cache systems Associative mapping
44
With associative mapping

algorithm, time is added
as one of the components
of the tag table record.
Two of the commonly
algorithms applying
temporal locality are:
Least Recently Used
(FIFO) and Most
Recently Used (LIFO).
A third algorithm selects
and removes a block at
random.
Static vs Dynamic memory allocation
45
Static memory allocation: The required

memory space for a declared variable is
allocated at compilation time. The
program knows the actual data location.
Dynamic memory allocation: Memory is
assigned during run time. The program
knows only a pointer to the actual data
location. Memory requests are satisfied
by allocating portions from a large pool of
memory called the heap. At any given
time, some parts of the heap are in use,
while some are "free" (unused) and thus
available for future allocations. Several
issues complicate implementation, such
as internal and external fragmentation,
Dynamic memory allocation
46
Dynamic means we allocate memory at runtime.

How managing main memory to accommodate
programs larger than main memory
multiple processes in main memory (process=program+data+stack=an
instance or invocation of a program)
multiple programs in main memory
Overlays: one of two or more pieces of code (or data) that can be loaded to
a pre-determined memory region on demand at runtime. Initially, each
overlay is stored in ROM/Flash, just like ordinary code/data. During runtime,
an overlay can be copied to a known address in RAM and executed there
when required. This can later be replaced by another overlay when
required.
Swapping: the system remains resident in memory and further assumes
that only a single user program is resident in memory at a time. One
program but many processes; save the context and swap among them!
Multiprogramming: permits one to run multiple programs in the same
memory space; we need an OS or at least a dispatcher
Process vs Programming
A process is a program in execution; it is often called a job or task

Program is static, just a bunch of bytes
No one-to-one mapping between processes and programs
can have multiple processes of the same program
one process can invoke multiple programs
A process consists of (at least):

an address space
the code and the data for the running
program
an execution stack and stack pointer (SP)
(traces state of procedure calls made)
the program counter (PC), indicating the next instruction
general-purpose processor registers and their values
a set of OS resources (open files, network connections, sound
channels, )
47
Testing memories Data lines
48
Assumption: the design of chips is correct and they contain no

internal manufacturing defects (ignore soft errors).
To test for the stuck-at-1 condition, a pattern of all 0's is written to a
memory address and followed by a read operation from the same
address. For a stuck-at-0 condition, the vice versa apply. If the same
data is read as was written, then a stuck-at fault does not exist on
any of the data lines.
A bridge fault connects two (or more) data lines; the actual voltage
level depends on the relative strengths of the driving signals. The
assumption here is that each of those signals (D0 and D1) will share
a common value.
Testing memories Address lines
49
In the presence of a stuck-at address line fault, two different memory

addresses are mapped to the same location. An address bit, A0, is
selected as the bit under test. Next, a data pattern, e.g. . . . .0000, is
chosen and that data is written to memory address . . . xxx0. A
different data pattern, say . . . .1111, is then selected and written to
memory address . . . xxx1.
The contents of the two locations are then read. If there is a stuck-at
fault on A0, both addresses will be mapped to the same location and
the same data will be read from the two different addresses.
Testing memories Address lines
50
Assume that a test for a bridge between address bits A0 and A1 is

conducted. Any address of the form: . . . xxxx01 is selected.
Next, a background data pattern (e.g. ...1111) is written to the two
possible aliased addresses: . . . xxxx00 and ... xxxx11. A different
data pattern (. . . 0000, for example) is then written to the test
address.
Finally, all three addresses are read. The test may have any of four
possible outcomes: no bridge fault, the logical 1 in the test address
dominates, or the logical 0 dominates, or neither dominates (in such
a case all three patterns are affected)
Testing memories - ROM
51
The ROM stores a particular set of data. If the data are incorrect, the
device is considered to have a failure. Thus, the testing strategy
must address the stuck-at and bridging faults as well as ensuring
that the correct data has been stored.
An effective method for testing ROM memories that can address all
of these issues is based on the CRC or cyclic redundancy check
and is known as signature analysis.

04 - Chapter 4 - The Hardware Side - Part 4 - Memory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 - Chapter 4 - The Hardware Side - Part 4 - Memory

Uploaded by

Copyright:

Available Formats

Embedded Systems: A

Contemporary Design Tool

E. Sisinni Digital Systems for Signal Processing

The term memory is generic; there are many different kinds of

RAM - Random Access Memory. As the name suggests, any

E. Sisinni Digital Systems for Signal Processing

Classifying memory - RAM

DRAM - Dynamic RAM. A simple memory cell design with bit

SDRAM Synchronous DRAM.

Classifying memory - ROM

PROM Programmable ROM. A PROM is typically programmed

EPROM Like PROM, a programming device is needed. Erasure,

EEPROM Electrically Erasable PROM. Erasure is done electrically

E. Sisinni Digital Systems for Signal Processing

A general memory interface

For each index that is

E. Sisinni Digital Systems for Signal Processing

A general memory interface

The physical model

E. Sisinni Digital Systems for Signal Processing

A general memory interface

E. Sisinni Digital Systems for Signal Processing

Access Time: The time to access a word in memory

E. Sisinni Digital Systems for Signal Processing

Bandwidth - is a measure of the word

E. Sisinni Digital Systems for Signal Processing

E. Sisinni Digital Systems for Signal Processing

E. Sisinni Digital Systems for Signal Processing

Transistors are used to connect to ground bit inside a memory word

E. Sisinni Digital Systems for Signal Processing

ROM Read operation

A value is read from a ROM by asserting one of the row lines.

E. Sisinni Digital Systems for Signal Processing

Non-volatile Read-Write Memories

Virtually identical in structure to ROMs.

E. Sisinni Digital Systems for Signal Processing

A high-level interface to the SRAM is very similar to that for the

E. Sisinni Digital Systems for Signal Processing

SRAM Read & Write

Typical timing for a read and a write operation is:

E. Sisinni Digital Systems for Signal Processing

In the DRAM there is only one transistor per cell!

E. Sisinni Digital Systems for Signal Processing

DRAM Read & Write

E. Sisinni Digital Systems for Signal Processing

The memory map

The memory map lists the

The memory map

We need glue-logic for address decoding

E. Sisinni Digital Systems for Signal Processing

Full I/O decoding

E. Sisinni Digital Systems for Signal Processing

Full I/O decoding

E. Sisinni Digital Systems for Signal Processing

Partial I/O decoding

E. Sisinni Digital Systems for Signal Processing

Partial I/O decoding

E. Sisinni Digital Systems for Signal Processing

An example: SRAM design

A system specification requires an SRAM system that can store up

An example: SRAM design

E. Sisinni Digital Systems for Signal Processing

An example: WRITE operation

E. Sisinni Digital Systems for Signal Processing