You are on page 1of 313

Embedded Systems Design: A Unified

Hardware/Software Introduction

Chapter 1: Introduction

Outline
Embedded systems overview
What are they?

Design challenge optimizing design metrics


Technologies
Processor technologies
IC technologies
Design technologies

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Embedded systems overview


Computing systems are everywhere
Most of us think of desktop computers

PCs
Laptops
Mainframes
Servers

But theres another type of computing system


Far more common...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Embedded systems overview


Embedded computing systems
Computing systems embedded within
electronic devices
Hard to define. Nearly any computing
system other than a desktop computer
Billions of units produced yearly, versus
millions of desktop units
Perhaps 50 per household and per
automobile

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Computers are in here...


and here...

and even here...

Lots more of these,


though they cost a lot
less each.

A short list of embedded systems


Anti-lock brakes
Auto-focus cameras
Automatic teller machines
Automatic toll systems
Automatic transmission
Avionic systems
Battery chargers
Camcorders
Cell phones
Cell-phone base stations
Cordless phones
Cruise control
Curbside check-in systems
Digital cameras
Disk drives
Electronic card readers
Electronic instruments
Electronic toys/games
Factory control
Fax machines
Fingerprint identifiers
Home security systems
Life-support systems
Medical testing systems

Modems
MPEG decoders
Network cards
Network switches/routers
On-board navigation
Pagers
Photocopiers
Point-of-sale systems
Portable video games
Printers
Satellite phones
Scanners
Smart ovens/dishwashers
Speech recognizers
Stereo systems
Teleconferencing systems
Televisions
Temperature controllers
Theft tracking systems
TV set-top boxes
VCRs, DVD players
Video game consoles
Video phones
Washers and dryers

And the list goes on and on


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Some common characteristics of embedded


systems
Single-functioned
Executes a single program, repeatedly

Tightly-constrained
Low cost, low power, small, fast, etc.

Reactive and real-time


Continually reacts to changes in the systems environment
Must compute certain results in real-time without delay

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

An embedded system example -- a digital


camera
Digital camera chip
CCD
CCD preprocessor

Pixel coprocessor

D2A

A2D
lens
JPEG codec

Microcontroller

Multiplier/Accum

DMA controller

Memory controller

Display ctrl

ISA bus interface

UART

LCD ctrl

Single-functioned -- always a digital camera


Tightly-constrained -- Low cost, low power, small, fast
Reactive and real-time -- only to a small extent

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Design challenge optimizing design metrics


Obvious design goal:
Construct an implementation with desired functionality

Key design challenge:


Simultaneously optimize numerous design metrics

Design metric
A measurable feature of a systems implementation
Optimizing design metrics is a key challenge

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Design challenge optimizing design metrics


Common metrics
Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost

NRE cost (Non-Recurring Engineering cost): The one-time


monetary cost of designing the system

Size: the physical space required by the system


Performance: the execution time or throughput of the system
Power: the amount of power consumed by the system
Flexibility: the ability to change the functionality of the system without
incurring heavy NRE cost

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Design challenge optimizing design metrics


Common metrics (continued)
Time-to-prototype: the time needed to build a working version of the
system

Time-to-market: the time required to develop a system to the point that it


can be released and sold to customers

Maintainability: the ability to modify the system after its initial release
Correctness, safety, many more

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Design metric competition -- improving one


may worsen others
Expertise with both software
and hardware is needed to
optimize design metrics

Power

Performance

Size

NRE cost

CCD

Digital camera chip


A2D

CCD preprocessor

Pixel coprocessor

D2A

lens
JPEG codec

Microcontroller

Multiplier/Accum

DMA controller

Memory controller

Display ctrl

ISA bus interface

Not just a hardware or


software expert, as is common
A designer must be
comfortable with various
technologies in order to choose
the best for a given application
and constraints

UART

LCD ctrl

Hardware
Software

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

11

Time-to-market: a demanding design metric

Revenues ($)

Time required to develop a


product to the point it can be
sold to customers
Market window
Period during which the
product would have highest
sales
Time (months)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Average time-to-market
constraint is about 8 months
Delays can be costly

12

Losses due to delayed market entry


Simplified revenue model
Revenues ($)

Peak revenue

Product life = 2W, peak at W


Time of market entry defines a
triangle, representing market
penetration
Triangle area equals revenue

Peak revenue from


delayed entry
On-time
Market fall

Market rise
Delayed

Loss
D

On-time
entry

The difference between the ontime and delayed triangle areas

2W

W
Time

Delayed
entry

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

13

Losses due to delayed market entry (cont.)


Area = 1/2 * base * height
Revenues ($)

Peak revenue
Peak revenue from
delayed entry
On-time
Market fall

Market rise
Delayed

On-time
entry

Delayed
entry

2W

W
Time

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

On-time = 1/2 * 2W * W
Delayed = 1/2 * (W-D+W)*(W-D)

Percentage revenue loss =


(D(3W-D)/2W2)*100%
Try some examples

Lifetime 2W=52 wks, delay D=4 wks


(4*(3*26 4)/2*26^2) = 22%
Lifetime 2W=52 wks, delay D=10 wks
(10*(3*26 10)/2*26^2) = 50%
Delays are costly!

14

NRE and unit cost metrics


Costs:
Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost
NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of
designing the system
total cost = NRE cost + unit cost * # of units
per-product cost
= total cost / # of units
= (NRE cost / # of units) + unit cost

Example
NRE=$2000, unit=$100
For 10 units
total cost = $2000 + 10*$100 = $3000
per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an
additional $200 per unit
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

NRE and unit cost metrics


Compare technologies by costs -- best depends on quantity
Technology A: NRE=$2,000, unit=$100
Technology B: NRE=$30,000, unit=$30
Technology C: NRE=$100,000, unit=$2
$200,000

B
C

$120,000

$80,000

A
B

$160

p er p rod uc t c ost

$160,000

tota l c ost (x1000)

$200

$120
$80

$40

$40,000

$0

$0
0

800

1600

2400

Numb er of units (volume)

800

1600

2400

Numb er of units (volume)

But, must also consider time-to-market


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

The performance design metric


Widely-used measure of system, widely-abused
Clock frequency, instructions per second not good measures
Digital camera example a user cares about how fast it processes images, not
clock speed or instructions per second

Latency (response time)


Time between task start and end
e.g., Cameras A and B process images in 0.25 seconds

Throughput
Tasks per second, e.g. Camera A processes 4 images per second
Throughput can be more than latency seems to imply due to concurrency, e.g.
Camera B may process 8 images per second (by capturing a new image while
previous image is being stored).

Speedup of B over A = Bs performance / As performance


Throughput speedup = 8/4 = 2
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Three key embedded system technologies


Technology
A manner of accomplishing a task, especially using technical
processes, methods, or knowledge

Three key technologies for embedded systems


Processor technology
IC technology
Design technology

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

18

Processor technology
The architecture of the computation engine used to implement a
systems desired functionality
Processor does not have to be programmable
Processor not equal to general-purpose processor
Controller

Datapath

Controller

Datapath

Controller

Datapath

Control
logic and
State register

Control logic
and State
register

Registers

Control
logic

index

Register
file

Custom
ALU

State
register

IR

PC

General
ALU

IR

total
+

PC
Data
memory

Program
memory
Assembly code
for:

Data
memory

Data
memory

Program memory
Assembly code
for:
total = 0
for i =1 to

total = 0
for i =1 to
General-purpose

Application-specific

Single-purpose (hardware)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

19

Processor technology
Processors vary in their customization for the problem at hand

Desired
functionality

General-purpose
processor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

total = 0
for i = 1 to N loop
total += M[i]
end loop

Application-specific
processor

Single-purpose
processor
20

General-purpose processors
Programmable device used in a variety of
applications
Also known as microprocessor

Features
Program memory
General datapath with large register file and
general ALU

User benefits
Low time-to-market and NRE costs
High flexibility

Pentium the most well-known, but


there are hundreds of others

Controller

Datapath

Control
logic and
State register

Register
file

IR

PC

Program
memory

General
ALU

Data
memory

Assembly code
for:
total = 0
for i =1 to

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

Single-purpose processors
Digital circuit designed to execute exactly
one program
a.k.a. coprocessor, accelerator or peripheral

Features
Contains only the components needed to
execute a single program
No program memory

Controller

Datapath

Control
logic

index

total
State
register

Data
memory

Benefits
Fast
Low power
Small size
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

22

Application-specific processors
Programmable processor optimized for a
particular class of applications having
common characteristics
Compromise between general-purpose and
single-purpose processors

Controller

Datapath

Control
logic and
State register

Registers

Custom
ALU
IR

PC

Features
Program
memory

Program memory
Optimized datapath
Special functional units

Data
memory

Assembly code
for:
total = 0
for i =1 to

Benefits
Some flexibility, good performance, size and
power
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

IC technology
The manner in which a digital (gate-level)
implementation is mapped onto an IC
IC: Integrated circuit, or chip
IC technologies differ in their customization to a design
ICs consist of numerous layers (perhaps 10 or more)
IC technologies differ with respect to who builds each layer and
when

IC package

IC

source

gate
oxide
channel

drain
Silicon substrate

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

24

IC technology
Three types of IC technologies
Full-custom/VLSI
Semi-custom ASIC (gate array and standard cell)
PLD (Programmable Logic Device)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

25

Full-custom/VLSI
All layers are optimized for an embedded systems
particular digital implementation
Placing transistors
Sizing transistors
Routing wires

Benefits
Excellent performance, small size, low power

Drawbacks
High NRE cost (e.g., $300k), long time-to-market
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Har

26

Semi-custom
Lower layers are fully or partially built
Designers are left with routing of wires and maybe placing
some blocks

Benefits
Good performance, good size, less NRE cost than a fullcustom implementation (perhaps $10k to $100k)

Drawbacks
Still require weeks to months to develop

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

PLD (Programmable Logic Device)


All layers already exist
Designers can purchase an IC
Connections on the IC are either created or destroyed to
implement desired functionality
Field-Programmable Gate Array (FPGA) very popular

Benefits
Low NRE costs, almost instant IC availability

Drawbacks
Bigger, expensive (perhaps $30 per unit), power hungry,
slower
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

28

Moores law
The most important trend in embedded systems
Predicted in 1965 by Intel co-founder Gordon Moore
IC transistor capacity has doubled roughly every 18 months
for the past several decades
10,000
1,000

Logic transistors
per chip
(in millions)

100
10
1
0.1

Note:
logarithmic scale

0.01
0.001

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

29

Moores law
Wow
This growth rate is hard to imagine, most people
underestimate
How many ancestors do you have from 20 generations ago
i.e., roughly how many people alive in the 1500s did it take to make
you?
220 = more than 1 million people

(This underestimation is the key to pyramid schemes!)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

30

Graphical illustration of Moores law


1981

1984

1987

1990

1993

1996

1999

2002

10,000
transistors

150,000,000
transistors

Leading edge
chip in 1981

Leading edge
chip in 2002

Something that doubles frequently grows more quickly


than most people realize!
A 2002 chip can hold about 15,000 1981 chips inside itself
Embedded
b
Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

31

Design Technology
The manner in which we convert our concept of
desired system functionality into an implementation
Compilation/
Synthesis
Compilation/Synthesis:
Automates exploration and
insertion of implementation
details for lower level.

Libraries/IP: Incorporates predesigned implementation from


lower abstraction level into
higher level.

Test/Verification: Ensures correct


functionality at each level, thus
reducing costly iterations
between levels.

Libraries/
IP

Test/
Verification

System
specification

System
synthesis

Hw/Sw/
OS

Model simulat./
checkers

Behavioral
specification

Behavior
synthesis

Cores

Hw-Sw
cosimulators

RT
specification

RT
synthesis

RT
components

HDL simulators

Logic
specification

Logic
synthesis

Gates/
Cells

Gate
simulators

To final implementation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

32

Design productivity exponential increase


100,000

1,000
100
10
1

Productivity
(K) Trans./Staff Mo.

10,000

2009

0.01
2007

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

1985

1983

0.1

Exponential increase over the past few decades


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

33

The co-design ladder


In the past:
Hardware and software
design technologies were
very different
Recent maturation of
synthesis enables a unified
view of hardware and
software

Hardware/software
codesign

Sequential program code (e.g., C, VHDL)


Behavioral synthesis
(1990's)

Compilers
(1960's,1970's)

Register transfers
Assembly instructions

RT synthesis
(1980's, 1990's)

Assemblers, linkers
(1950's, 1960's)

Logic equations / FSM's

Machine instructions

Logic synthesis
(1970's, 1980's)
Logic gates

Microprocessor plus
program bits: software

Implementation

VLSI, ASIC, or PLD


implementation: hardware

The choice of hardware versus software for a particular function is simply a tradeoff among various
design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no
fundamental difference between what hardware or software can implement.
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

34

Independence of processor and IC


technologies
Basic tradeoff
General vs. custom
With respect to processor technology or IC technology
The two technologies are independent

General,
providing improved:

Generalpurpose
processor

ASIP

Singlepurpose
processor

Flexibility
Maintainability
NRE cost
Time- to-prototype
Time-to-market
Cost (low volume)

Customized,
providing improved:
Power efficiency
Performance
Size
Cost (high volume)

PLD

Semi-custom

Full-custom

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

35

Design productivity gap


While designer productivity has grown at an impressive rate
over the past decades, the rate of improvement has not kept
pace with chip capacity

Logic transistors
per chip
(in millions)

10,000

100,000

1,000

10,000

100
10

1000

Gap
IC capacity

10

0.1
0.01
0.001

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

100

Productivity
(K) Trans./Staff-Mo.

productivity

0.1
0.01

36

Design productivity gap


1981 leading edge chip required 100 designer months
10,000 transistors / 100 transistors/month

2002 leading edge chip requires 30,000 designer months


150,000,000 / 5000 transistors/month

Designer cost increase from $1M to $300M

Logic transistors
per chip
(in millions)

10,000

100,000

1,000

10,000

100
10

1000
100

Gap
IC capacity

1
0.1

10
1

productivity

0.01

Productivity
(K) Trans./Staff-Mo.

0.1

0.001

0.01

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

37

The mythical man-month


The situation is even worse than the productivity gap indicates

In theory, adding designers to team reduces project completion time


In reality, productivity per designer decreases due to complexities of team management
and communication
In the software community, known as the mythical man-month (Brooks 1975)
At some point, can actually lengthen project completion time! (Too many cooks)

1M transistors, 1
designer=5000 trans/month
Each additional designer
reduces for 100 trans/month
So 2 designers produce 4900
trans/month each

60000
50000
40000
30000
20000
10000

16

16

19

18
23

24
Months until completion
43
Individual
0

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Team

15

10

20
30
Number of designers

40

38

Summary
Embedded systems are everywhere
Key challenge: optimization of design metrics
Design metrics compete with one another

A unified view of hardware and software is necessary to


improve productivity
Three key technologies
Processor: general-purpose, application-specific, single-purpose
IC: Full-custom, semi-custom, PLD
Design: Compilation/synthesis, libraries/IP, test/verification

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

39

Embedded Systems Design: A Unified Hardware/Software


Introduction

Chapter 10: IC Technology

Outline
Anatomy of integrated circuits
Full-Custom (VLSI) IC Technology
Semi-Custom (ASIC) IC Technology
Programmable Logic Device (PLD) IC Technology

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

CMOS transistor
Source, Drain
Diffusion area where electrons can flow
Can be connected to metal contacts (vias)

Gate
Polysilicon area where control voltage is applied

Oxide
Si O2 Insulator so the gate voltage cant leak

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

End of the Moores Law?


Every dimension of the MOSFET has to scale
(PMOS) Gate oxide has to scale down to
Increase gate capacitance
Reduce leakage current from S to D
Pinch off current from source to drain

Current gate oxide thickness is about 2.5-3nm

Thats about 25 atoms!!!

IC package

IC

source

gate
oxide
channel

drain
Silicon substrate

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

20Ghz +
FinFET has been manufactured to
18nm
Still acts as a very good transistor

Simulation shown that it can be scaled


to 10nm
Quantum effect start to kick in
Reduce mobility by ~10%

Ballistic transport become significant


Increase current by about ~20%

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

NAND

Metal layers for routing (~10)


PMOS dont like 0
NMOS dont like 1
A stick diagram form the basis for mask sets

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Silicon manufacturing steps


Tape out
Send design to manufacturing

Spin
One time through the manufacturing process

Photolithography
Drawing patterns by using photoresist to form barriers for deposition

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Full Custom
Very Large Scale Integration (VLSI)
Placement
Place and orient transistors

Routing
Connect transistors

Sizing
Make fat, fast wires or thin, slow wires
May also need to size buffer

Design Rules
simple rules for correct circuit function
Metal/metal spacing, min poly width
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Full Custom
Best size, power, performance
Hand design
Horrible time-to-market/flexibility/NRE cost
Reserve for the most important units in a processor
ALU, Instruction fetch

Physical design tools


Less optimal, but faster

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Semi-Custom
Gate Array

Array of prefabricated gates


place and route
Higher density, faster time-to-market
Does not integrate as well with full-custom

Standard Cell

A library of pre-designed cell


Place and route
Lower density, higher complexity
Integrate great with full-custom

Embedded Systems Design: A Unified


Hardware/Software
Introduction, (c) 2000 Vahid/Givargis
d

11

Semi-Custom
Most popular design style
Jack of all trade
Good
Power, time-to-market, performance,
NRE cost, per-unit cost, area

Master of none
Integrate with full custom for
critical regions of design

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

13

Programmable Logic Device


Programmable Logic Device
Programmable Logic Array, Programmable Array Logic, Field Programmable
Gate Array

All layers already exist


Designers can purchase an IC
To implement desired functionality
Connections on the IC are either created or destroyed to implement

Benefits
Very low NRE costs
Great time to market

Drawback
High unit cost, bad for large volume
Power
Except special PLA

slower
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

800-6400 usable gates


5-15 ns delay, up to 125 MHz
(2004)
Few $s price
14

Xilinx FPGA

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

Configurable Logic Block (CLB)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

I/O Block

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 8: State Machine and


Concurrent Process Model

Outline
Models vs. Languages
State Machine Model
FSM/FSMD
HCFSM and Statecharts Language
Program-State Machine (PSM) Model

Concurrent Process Model


Communication
Synchronization
Implementation

Dataflow Model
Real-Time Systems
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introduction
Describing embedded systems processing behavior
Can be extremely difficult
Complexity increasing with increasing IC capacity
Past: washing machines, small games, etc.
Hundreds of lines of code
Today: TV set-top boxes, Cell phone, etc.
Hundreds of thousands of lines of code

Desired behavior often not fully understood in beginning


Many implementation bugs due to description mistakes/omissions

English (or other natural language) common starting point


Precise description difficult to impossible
Example: Motor Vehicle Code thousands of pages long...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

An example of trying to be precise in English


California Vehicle Code
Right-of-way of crosswalks
21950. (a) The driver of a vehicle shall yield the right-of-way to a pedestrian crossing
the roadway within any marked crosswalk or within any unmarked crosswalk at an
intersection, except as otherwise provided in this chapter.
(b) The provisions of this section shall not relieve a pedestrian from the duty of using
due care for his or her safety. No pedestrian shall suddenly leave a curb or other place
of safety and walk or run into the path of a vehicle which is so close as to constitute
an immediate hazard. No pedestrian shall unnecessarily stop or delay traffic while in a
marked or unmarked crosswalk.
(c) The provisions of subdivision (b) shall not relieve a driver of a vehicle from the
duty of exercising due care for the safety of any pedestrian within any marked
crosswalk or within any unmarked crosswalk at an intersection.

All that just for crossing the street (and theres much more)!

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Models and languages


How can we (precisely) capture behavior?
We may think of languages (C, C++), but computation model is the key

Common computation models:


Sequential program model
Statements, rules for composing statements, semantics for executing them

Communicating process model


Multiple sequential programs running concurrently

State machine model


For control dominated systems, monitors control inputs, sets control outputs

Dataflow model
For data dominated systems, transforms input data streams into output streams

Object-oriented model
For breaking complex software into simpler, well-defined pieces
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Models vs. languages


Poetry

Recipe

Story

State
machine

Sequent.
program

Dataflow

English

Spanish

Japanese

C++

Java

Models

Languages

Recipes vs. English

Sequential programs vs. C

Computation models describe system behavior


Conceptual notion, e.g., recipe, sequential program

Languages capture models


Concrete form, e.g., English, C

Variety of languages can capture one model


E.g., sequential program model C,C++, Java

One language can capture variety of models


E.g., C++ VHTXHQWLDOSURJUDPPRGHOREMHFW-oriented model, state machine model

Certain languages better at capturing certain computation models

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Text versus Graphics


Models versus languages not to be confused with text
versus graphics
Text and graphics are just two types of languages
Text: letters, numbers
Graphics: circles, arrows (plus some letters, numbers)

X = 1;

X=1

Y = X + 1;

Y=X+1

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introductory example: An elevator controller


Partial English description

Simple elevator
controller
Request Resolver
resolves various floor
requests into single
requested floor
Unit Control moves
elevator to this requested
floor

Try capturing in C...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Move the elevator either up or down


to reach the requested floor. Once at
the requested floor, open the door for
at least 10 seconds, and keep it open
until the requested floor changes.
Ensure the door is never open while
moving. Dont change directions
unless there are no higher requests
when moving up or no lower requests
when moving down

System interface

up

Unit
Control

down
open
floor

req
Request
Resolver

...

b1
b2
bN
up1
up2
dn2
up3
dn3

buttons
inside
elevator

up/down
buttons on
each
floor

...
dnN

Elevator controller using a sequential


program model
Sequential program model
Inputs: int floor; bit b1..bN; up1..upN-1; dn2..dnN;
Outputs: bit up, down, open;
Global variables: int req;
void UnitControl()
{
up = down = 0; open = 1;
while (1) {
while (req == floor);
open = 0;
if (req > floor) { up = 1;}
else {down = 1;}
while (req != floor);
up = down = 0;
open = 1;
delay(10);
}
}

void RequestResolver()
{
while (1)
...
req = ...
...
}
void main()
{
Call concurrently:
UnitControl() and
RequestResolver()
}

System interface
Partial English description
Move the elevator either up or down
to reach the requested floor. Once at
the requested floor, open the door for
at least 10 seconds, and keep it open
until the requested floor changes.
Ensure the door is never open while
moving. Dont change directions
unless there are no higher requests
when moving up or no lower requests
when moving down

up

Unit
Control

down
open
floor

req
Request
Resolver

You might have come up with something having


even more if statements.

...

b1
b2
bN
up1
up2
dn2
up3
dn3

buttons
inside
elevator

up/down
buttons on
each
floor

...
dnN

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Finite-state machine (FSM) model


Trying to capture this behavior as sequential program is a bit
awkward
Instead, we might consider an FSM model, describing the system
as:
Possible states
E.g., Idle, GoingUp, GoingDn, DoorOpen

Possible transitions from one state to another based on input


E.g., req > floor

Actions that occur in each state


E.g., In the GoingUp state, u,d,o,t = 1,0,0,0 (up = 1, down, open, and
timer_start = 0)

Try it...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Finite-state machine (FSM) model


UnitControl process using a state machine
req > floor

u,d,o, t = 1,0,0,0

GoingUp

!(req > floor)


timer < 10

req > floor


!(timer < 10)

u,d,o,t = 0,0,1,0
Idle
req == floor

u,d,o,t = 0,1,0,0

DoorOpen
u,d,o,t = 0,0,1,1

req < floor


!(req<floor)
GoingDn

u is up, d is down, o is open


req < floor

t is timer_start

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

11

Formal definition
An FSM is a 6-tuple F<S, I, O, F, H, s0>

S is a set of all states {s0, s1, , sl}


I is a set of inputs {i0, i1, , im}
O is a set of outputs {o0, o1, , on}
F is a next-state function (S x I S)
H is an output function (S O)
s0 is an initial state

Moore-type
Associates outputs with states (as given above, H maps S O)

Mealy-type
Associates outputs with transitions (H maps S x I O)

Shorthand notations to simplify descriptions


Implicitly assign 0 to all unassigned outputs in a state
Implicitly AND every transition condition with clock edge (FSM is synchronous)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

Finite-state machine with datapath model


(FSMD)

FSMD extends FSM: complex data types and variables for storing data
FSMs use only Boolean data types and operations, no variables

We described UnitControl as an FSMD

FSMD: 7-tuple <S, I , O, V, F, H, s0>

req > floor

S is a set of states {s0, s1, , sl}


I is a set of inputs {i0, i1, , im}
O is a set of outputs {o0, o1, , on}

u,d,o, t = 1,0,0,0

V is a set of variables {v0, v1, , vn}


F is a next-state function (S x I x V S)
H is an action function (S O + V)
s0 is an initial state

GoingUp

!(req > floor)

req > floor


!(timer < 10)
u,d,o,t = 0,0,1,0
Idle
req == floor
req < floor
u,d,o,t = 0,1,0,0

timer < 10
DoorOpen
u,d,o,t = 0,0,1,1

!(req<floor)

GoingDn

u is up, d is down, o is open


req < floor

t is timer_start

I,O,V may represent complex data types (i.e., integers, floating point, etc.)
F,H may include arithmetic operations
H is an action function, not just an output function
Describes variable updates as well as outputs

Complete system state now consists of current state, si, and values of all variables

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

13

Describing a system as a state machine


1. List all possible states
2. Declare all variables (none in this example)
3. For each state, list possible transitions, with conditions, to other states
4. For each state and/or transition,
list associated actions
5. For each state, ensure exclusive
and complete exiting transition
conditions

No two exiting conditions can


be true at same time

Otherwise nondeterministic
state machine

One condition must be true at


any given time

Reducing explicit transitions


should be avoided when first
learning

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

req > floor

u,d,o, t = 1,0,0,0

!(req > floor)

GoingUp

timer < 10

req > floor


u,d,o,t = 0,0,1,0
Idle

!(timer < 10)

DoorOpen
u,d,o,t = 0,0,1,1

req == floor
req < floor

u,d,o,t = 0,1,0,0

!(req<floor)
GoingDn
u is up, d is down, o is open
req < floor

t is timer_start

14

State machine vs. sequential program model


Different thought process used with each model
State machine:
Encourages designer to think of all possible states and transitions among states
based on all possible input conditions

Sequential program model:


Designed to transform data through series of instructions that may be iterated and
conditionally executed

State machine description excels in many cases


More natural means of computing in those cases
Not due to graphical representation (state diagram)
Would still have same benefits if textual language used (i.e., state table)
Besides, sequential program model could use graphical representation (i.e., flowchart)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

Try Capturing Other Behaviors with an FSM


E.g., Answering machine blinking light when there are
messages
E.g., A simple telephone answering machine that
answers after 4 rings when activated
E.g., A simple crosswalk traffic control light
Others

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

Capturing state machines in


sequential programming language

Despite benefits of state machine model, most popular development tools use
sequential programming language
C, C++, Java, Ada, VHDL, Verilog, etc.
Development tools are complex and expensive, therefore not easy to adapt or replace
Must protect investment

Two approaches to capturing state machine model with sequential programming


language
Front-end tool approach
Additional tool installed to support state machine language
Graphical and/or textual state machine languages
May support graphical simulation
Automatically generate code in sequential programming language that is input to main development tool

Drawback: must support additional tool (licensing costs, upgrades, training, etc.)

Language subset approach


Most common approach...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Language subset approach

Follow rules (template) for capturing


state machine constructs in equivalent
sequential language constructs
Used with software (e.g.,C) and
hardware languages (e.g.,VHDL)
Capturing UnitControl state machine
in C

Enumerate all states (#define)


Declare state variable initialized to
initial state (IDLE)
Single switch statement branches to
current states case
Each case has actions

up, down, open, timer_start

Each case checks transition conditions


to determine next state

if() {state = ;}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

#define IDLE0
#define GOINGUP1
#define GOINGDN2
#define DOOROPEN3
void UnitControl() {
int state = IDLE;
while (1) {
switch (state) {
IDLE: up=0; down=0; open=1; timer_start=0;
if
(req==floor) {state = IDLE;}
if
(req > floor) {state = GOINGUP;}
if
(req < floor) {state = GOINGDN;}
break;
GOINGUP: up=1; down=0; open=0; timer_start=0;
if
(req > floor) {state = GOINGUP;}
if
(!(req>floor)) {state = DOOROPEN;}
break;
GOINGDN: up=1; down=0; open=0; timer_start=0;
if
(req < floor) {state = GOINGDN;}
if
(!(req<floor)) {state = DOOROPEN;}
break;
DOOROPEN: up=0; down=0; open=1; timer_start=1;
if (timer < 10) {state = DOOROPEN;}
if (!(timer<10)){state = IDLE;}
break;
}
}
}

UnitControl state machine in sequential programming language

18

General template
#define S0 0
#define S1 1
...
#define SN N
void StateMachine() {
int state = S0; // or whatever is the initial state.
while (1) {
switch (state) {
S0:
// Insert S0s actions here & Insert transitions Ti leaving S0:
if( T0s condition is true ) {state = T0s next state; /*actions*/ }
if( T1s condition is true ) {state = T1s next state; /*actions*/ }
...
if( Tms condition is true ) {state = Tms next state; /*actions*/ }
break;
S1:
// Insert S1s actions here
// Insert transitions Ti leaving S1
break;
...
SN:
// Insert SNs actions here
// Insert transitions Ti leaving SN
break;
}
}
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

19

HCFSM and the Statecharts language

Hierarchical/concurrent state machine model


(HCFSM)

Extension to state machine model to support


hierarchy and concurrency
States can be decomposed into another state
machine

y
A2

A1

A1
B

A2

States can execute concurrently

With hierarchy has identical functionality as Without


hierarchy, but has one less transition (z)
Known as OR-decomposition

With hierarchy

Without hierarchy

Known as AND-decomposition

Concurrency

Statecharts

Graphical language to capture HCFSM


timeout: transition with time limit as condition
history: remember last substate OR-decomposed
state A was in before transitioning to another state B

B
C

C1
x

D1
y

C2

v
D2

Return to saved substate of A when returning from B


instead of initial state

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

20

UnitControl with FireMode


req>floor
u,d,o = 1,0,0

GoingUp
req>floor

u,d,o = 0,0,1

UnitControl

timeout(10)

req==floor
u,d,o = 0,1,0

FireMode

!(req>floor)

Idle
DoorOpen
fire
fire
!(req<floor)
req<floor
fire
FireGoingDn
GoingDn
fire
floor>1

req<floor

When fire is true, move elevator


to 1st floor and open door
w/o hierarchy: Getting messy!
w/ hierarchy: Simple!

u,d,o = 0,0,1

u,d,o = 0,1,0
floor==1 u,d,o = 0,0,1
FireDrOpen

!fire

With hierarchy

fire

UnitControl

Without hierarchy
NormalMode

req>floor
u,d,o = 1,0,0

GoingUp

!(req>floor)

req>floor
ElevatorController
UnitControl

u,d,o = 0,0,1

RequestResolver

NormalMode

u,d,o = 0,1,0
...

!fire

Idle

req==floor
req<floor
GoingDn

fire

timeout(10)
!(req>floor)

DoorOpen

u,d,o = 0,0,1

req<floor

FireMode

fire
!fire

With concurrent RequestResolver

FireMode
u,d,o = 0,1,0
FireGoingDn
floor==1 u,d,o = 0,0,1
floor>1
FireDrOpen
fire

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

Program-state machine model (PSM):


HCFSM plus sequential program model

Program-states actions can be FSM or


sequential program

ElevatorController
int req;
UnitControl
NormalMode
up = down = 0; open = 1;
while (1) {
while (req == floor);
open = 0;
if (req > floor) { up = 1;}
else {down = 1;}
while (req != floor);
open = 1;
delay(10);
}
}
!fire
fire

Designer can choose most appropriate

Stricter hierarchy than HCFSM used in


Statecharts
transition between sibling states only, single entry
Program-state may complete
Reaches end of sequential program code, OR
FSM transition to special complete substate
PSM has 2 types of transitions

Transition-immediately (TI): taken regardless of


source program-state
Transition-on-completion (TOC): taken only if
condition is true AND source program-state is
complete

SpecCharts: extension of VHDL to capture PSM


model
SpecC: extension of C to capture PSM model
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

RequestResolver
...
req = ...
...

FireMode
up = 0; down = 1; open = 0;
while (floor > 1);
up = 0; down = 0; open = 1;

NormalMode and FireMode described as


sequential programs
Black square originating within FireMode
indicates !fire is a TOC transition

Transition from FireMode to NormalMode


only after FireMode completed

22

Role of appropriate model and language

Finding appropriate model to capture embedded system is an important step


Model shapes the way we think of the system
Originally thought of sequence of actions, wrote sequential program

First wait for requested floor to differ from target floor


Then, we close the door
Then, we move up or down to the desired floor
Then, we open the door
Then, we repeat this sequence

To create state machine, we thought in terms of states and transitions among states
When system must react to changing inputs, state machine might be best model
HCFSM described FireMode easily, clearly

Language should capture model easily


Ideally should have features that directly capture constructs of model
FireMode would be very complex in sequential program
Checks inserted throughout code

Other factors may force choice of different model


Structured techniques can be used instead
E.g., Template for state machine capture in sequential program language

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

Concurrent process model

ConcurrentProcessExample() {
x = ReadX()
y = ReadY()
Call concurrently:
PrintHelloWorld(x) and
PrintHowAreYou(y)
}
PrintHelloWorld(x) {
while( 1 ) {
print "Hello world."
delay(x);
}
}
PrintHowAreYou(x) {
while( 1 ) {
print "How are you?"
delay(y);
}
}

Describes functionality of system in terms of two or more


concurrently executing subtasks
Many systems easier to describe with concurrent process model
because inherently multitasking
E.g., simple example:

Read two numbers X and Y


Display Hello world. every X seconds
Display How are you? every Y seconds

More effort would be required with sequential program or state


machine model

PrintHelloWorld

Simple concurrent process example

ReadX

ReadY
PrintHowAreYou
time

Subroutine execution over time


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Enter X: 1
Enter Y: 2
Hello world.
Hello world.
How are you?
Hello world.
How are you?
Hello world.
...

(Time
(Time
(Time
(Time
(Time
(Time

=
=
=
=
=
=

1
2
2
3
4
4

s)
s)
s)
s)
s)
s)

Sample input and output

24

Dataflow model

Derivative of concurrent process model


Nodes represent transformations

May execute concurrently

B C

Edges represent flow of tokens (data) from one node to another

Z = (A + B) * (C - D)

May or may not have token at any given time

t1 t2

When all of nodes input edges have at least one token, node may
fire
When node fires, it consumes input tokens processes
transformation and generates output token
Nodes may fire simultaneously
Several commercial tools support graphical languages for capture
of dataflow model

Nodes with arithmetic


transformations
A

B C

modulate

convolve
t1 t2

Can automatically translate to concurrent process model for


implementation
Each node becomes a process

transform

Nodes with more complex


transformations
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

25

Synchronous dataflow

With digital signal-processors (DSPs), data flows at fixed rate


Multiple tokens consumed and produced per firing
Synchronous dataflow model takes advantage of this
Each edge labeled with number of tokens consumed/produced
each firing
Can statically schedule nodes, so can easily use sequential
program model
Dont need real-time operating system and its overhead

How would you map this model to a sequential programming


language? Try it...
Algorithms developed for scheduling nodes into singleappearance schedules
Only one statement needed to call each nodes associated
procedure

A
mA

mB

mC

modulate

mD
convolve

mt1

t1

t2

tt1

ct2
tt2

transform
tZ
Z

Synchronous dataflow

Allows procedure inlining without code explosion, thus reducing


overhead even more

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

26

Concurrent processes and real-time systems

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

Concurrent processes
Consider two examples
having separate tasks running
independently but sharing
data
Difficult to write system
using sequential program
model
Concurrent process model
easier
Separate sequential
programs (processes) for
each task
Programs communicate with
each other
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Heartbeat Monitoring System


B[1..4]

Heart-beat
pulse

Task 1:
Read pulse
If pulse < Lo then
Activate Siren
If pulse > Hi then
Activate Siren
Sleep 1 second
Repeat

Task 2:
If B1/B2 pressed then
Lo = Lo +/ 1
If B3/B4 pressed then
Hi = Hi +/ 1
Sleep 500 ms
Repeat

Set-top Box

Input
Signal

Task 1:
Read Signal
Separate Audio/Video
Send Audio to Task 2
Send Video to Task 3
Repeat

Task 2:
Wait on Task 1
Decode/output Audio
Repeat
Task 3:
Wait on Task 1
Decode/output Video
Repeat

Video

Audio

28

Process
A sequential program, typically an infinite loop
Executes concurrently with other processes
We are about to enter the world of concurrent programming

Basic operations on processes


Create and terminate
Create is like a procedure call but caller doesnt wait
Created process can itself create new processes

Terminate kills a process, destroying all data


In HelloWord/HowAreYou example, we only created processes

Suspend and resume


Suspend puts a process on hold, saving state for later execution
Resume starts the process again where it left off

Join
A process suspends until a particular child process finishes execution
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

29

Communication among processes


Processes need to communicate data and
signals to solve their computation problem
Processes that dont communicate are just
independent programs solving separate problems

Basic example: producer/consumer


Process A produces data items, Process B consumes
them
E.g., A decodes video packets, B display decoded
packets on a screen

Encoded video
packets
processA() {
// Decode packet
// Communicate packet
to B
}
}

Decoded video
packets
void processB() {
// Get packet from A
// Display packet
}

How do we achieve this communication?


Two basic methods
Shared memory
Message passing
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

To display

30

Shared Memory
Processes read and write shared variables
No time overhead, easy to implement
But, hard to use mistakes are common

Example: Producer/consumer with a mistake

Share buffer[N], count

processA produces data items and stores in buffer

processB consumes data items from buffer

Error when both processes try to update count concurrently (lines 10 and 19)
and the following execution sequence occurs. Say count is 3.

count = # of valid data items in buffer


If buffer is full, must wait
If buffer is empty, must wait

A loads count (count = 3) from memory into register R1 (R1 = 3)


A increments R1 (R1 = 4)
B loads count (count = 3) from memory into register R2 (R2 = 3)
B decrements R2 (R2 = 2)
A stores R1 back to count in memory (count = 4)
B stores R2 back to count in memory (count = 2)

01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:

data_type buffer[N];
int count = 0;
void processA() {
int i;
while( 1 ) {
produce(&data);
while( count == N );/*loop*/
buffer[i] = data;
i = (i + 1) % N;
count = count + 1;
}
}
void processB() {
int i;
while( 1 ) {
while( count == 0 );/*loop*/
data = buffer[i];
i = (i + 1) % N;
count = count - 1;
consume(&data);
}
}
void main() {
create_process(processA);
create_process(processB);
}

count now has incorrect value of 2

Embedded
mb
Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

31

Message Passing
Message passing
Data explicitly sent from one process to
another
Sending process performs special operation,
send
Receiving process must perform special
operation, receive, to receive the data
Both operations must explicitly specify which
process it is sending to or receiving from
Receive is blocking, send may or may not be
blocking

void processA() {
while( 1 ) {
produce(&data)
send(B, &data);
/* region 1 */
receive(B, &data);
consume(&data);
}
}
void processB() {
while( 1 ) {
receive(A, &data);
transform(&data)
send(A, &data);
/* region 2 */
}
}

Safer model, but less flexible

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

32

Back to Shared Memory: Mutual Exclusion


Certain sections of code should not be performed concurrently
Critical section
Possibly noncontiguous section of code where simultaneous updates, by multiple
processes to a shared memory location, can occur

When a process enters the critical section, all other processes must be locked
out until it leaves the critical section
Mutex
A shared object used for locking and unlocking segment of shared data
Disallows read/write access to memory it guards
Multiple processes can perform lock operation simultaneously, but only one process
will acquire lock
All other processes trying to obtain lock will be put in blocked state until unlock
operation performed by acquiring process when it exits critical section
These processes will then be placed in runnable state and will compete for lock again

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

33

Correct Shared Memory Solution to the


Consumer-Producer Problem

The primitive mutex is used to ensure critical sections are


executed in mutual exclusion of each other
Following the same execution sequence as before:

A/B execute lock operation on count_mutex


Either A or B will acquire lock

B loads count (count = 3) from memory into register R2 (R2


= 3)
B decrements R2 (R2 = 2)
B stores R2 back to count in memory (count = 2)
B executes unlock operation

Say B acquires it
A will be put in blocked state

A is placed in runnable state again

A loads count (count = 2) from memory into register R1 (R1


= 2)
A increments R1 (R1 = 3)
A stores R1 back to count in memory (count = 3)

Count now has correct value of 3

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:

data_type buffer[N];
int count = 0;
mutex count_mutex;
void processA() {
int i;
while( 1 ) {
produce(&data);
while( count == N );/*loop*/
buffer[i] = data;
i = (i + 1) % N;
count_mutex.lock();
count = count + 1;
count_mutex.unlock();
}
}
void processB() {
int i;
while( 1 ) {
while( count == 0 );/*loop*/
data = buffer[i];
i = (i + 1) % N;
count_mutex.lock();
count = count - 1;
count_mutex.unlock();
consume(&data);
}
}
void main() {
create_process(processA);
create_process(processB);
}

34

Process Communication
Try modeling req value of our
elevator controller

System interface

up

Unit
Control

Using shared memory


Using shared memory and mutexes
Using message passing

down
open
floor

req
Request
Resolver

...

b1
b2
bN
up1
up2
dn2
up3
dn3

buttons
inside
elevator

up/down
buttons on
each
floor

...
dnN

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

35

A Common Problem in Concurrent


Programming: Deadlock

Deadlock: A condition where 2 or more processes are


blocked waiting for the other to unlock critical sections of
code

Both processes are then in blocked state


Cannot execute unlock operation so will wait forever

Example code has 2 different critical sections of code that


can be accessed simultaneously

2 locks needed (mutex1, mutex2)


Following execution sequence produces deadlock

A executes lock operation on mutex1 (and acquires it)


B executes lock operation on mutex2( and acquires it)
A/B both execute in critical sections 1 and 2, respectively
A executes lock operation on mutex2

B executes lock operation on mutex1

A blocked until B unlocks mutex2


B blocked until A unlocks mutex1

DEADLOCK!

One deadlock elimination protocol requires locking of


numbered mutexes in increasing order and two-phase
locking (2PL)

01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:

mutex mutex1, mutex2;


void processA() {
while( 1 ) {

mutex1.lock();
/* critical section
mutex2.lock();
/* critical section
mutex2.unlock();
/* critical section
mutex1.unlock();
}
}
void processB() {
while( 1 ) {

mutex2.lock();
/* critical section
mutex1.lock();
/* critical section
mutex1.unlock();
/* critical section
mutex2.unlock();
}
}

1 */
2 */
1 */

2 */
1 */
2 */

Acquire locks in 1st phase only, release locks in 2nd phase

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

36

Synchronization among processes


Sometimes concurrently running processes must synchronize their execution
When a process must wait for:
another process to compute some value
reach a known point in their execution
signal some condition

Recall producer-consumer problem


processA must wait if buffer is full
processB must wait if buffer is empty
This is called busy-waiting
Process executing loops instead of being blocked
CPU time wasted

More efficient methods


Join operation, and blocking send and receive discussed earlier
Both block the process so it doesnt waste CPU time

Condition variables and monitors


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

37

Condition variables
Condition variable is an object that has 2 operations, signal and wait
When process performs a wait on a condition variable, the process is blocked
until another process performs a signal on the same condition variable
How is this done?
Process A acquires lock on a mutex
Process A performs wait, passing this mutex
Causes mutex to be unlocked

Process B can now acquire lock on same mutex


Process B enters critical section
Computes some value and/or make condition true

Process B performs signal when condition true


Causes process A to implicitly reacquire mutex lock
Process A becomes runnable

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

38

Condition variable example:


consumer-producer

Consumer-producer using condition variables

2 condition variables
buffer_empty
Signals at least 1 free location available in buffer
buffer_full
Signals at least 1 valid data item in buffer

processA:

produces data item


acquires lock (cs_mutex) for critical section
checks value of count
if count = N, buffer is full
performs wait operation on buffer_empty
this releases the lock on cs_mutex allowing
processB to enter critical section, consume data
item and free location in buffer
processB then performs signal
if count < N, buffer is not full
processA inserts data into buffer
increments count
signals processB making it runnable if it has
performed a wait operation on buffer_full

01:
02:
03:
04:
06:
07:
08:
09:
10:
11:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
37:

data_type buffer[N];
int count = 0;
mutex cs_mutex;
condition buffer_empty, buffer_full;
void processA() {
int i;
while( 1 ) {
produce(&data);
cs_mutex.lock();
if( count == N ) buffer_empty.wait(cs_mutex);
buffer[i] = data;
i = (i + 1) % N;
count = count + 1;
cs_mutex.unlock();
buffer_full.signal();
}
}
void processB() {
int i;
while( 1 ) {
cs_mutex.lock();
if( count == 0 ) buffer_full.wait(cs_mutex);
data = buffer[i];
i = (i + 1) % N;
count = count - 1;
cs_mutex.unlock();
buffer_empty.signal();
consume(&data);
}
}
void main() {
create_process(processA); create_process(processB);
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

39

Monitors

Collection of data and methods or subroutines that


operate on data similar to an object-oriented
paradigm
Monitor guarantees only 1 process can execute
inside monitor at a time

(a) Process X executes while Process Y has to wait

(b) Process X performs wait on a condition


Process Y allowed to enter and execute

Monitor

Monitor

DATA

Waiting

DATA

CODE

Process
X

CODE

Process
Y

Process
X

(a)

(b)

Monitor

(c) Process Y signals condition Process X waiting on


Process Y blocked
Process X allowed to continue executing
(d) Process X finishes executing in monitor or waits
on a condition again
Process Y made runnable again

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Monitor

DATA

Waiting

DATA

CODE

Process
X

CODE

Process
Y
(c)

Process
Y

Process
X

Process
Y
(d)

40

Monitor example: consumer-producer

Single monitor encapsulates both


processes along with buffer and count
One process will be allowed to begin
executing first
If processB allowed to execute first

Will execute until it finds count = 0


Will perform wait on buffer_full condition
variable
processA now allowed to enter monitor and
execute
processA produces data item
finds count < N so writes to buffer and
increments count
processA performs signal on buffer_full
condition variable
processA blocked
processB reenters monitor and continues
execution, consumes data, etc.

01:
02:
03:
04:
06:
07:
08:
09:
10:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
35:

Monitor {
data_type buffer[N];
int count = 0;
condition buffer_full, condition buffer_empty;
void processA() {
int i;
while( 1 ) {
produce(&data);
if( count == N ) buffer_empty.wait();
buffer[i] = data;
i = (i + 1) % N;
count = count + 1;
buffer_full.signal();
}
}
void processB() {
int i;
while( 1 ) {
if( count == 0 ) buffer_full.wait();
data = buffer[i];
i = (i + 1) % N;
count = count - 1;
buffer_empty.signal();
consume(&data);
buffer_full.signal();
}
}
} /* end monitor */
void main() {
create_process(processA); create_process(processB);
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

41

Implementation

Mapping of systems functionality


onto hardware processors:
captured using computational
model(s)
written in some language(s)

Implementation choice independent


from language(s) choice
Implementation choice based on
power, size, performance, timing and
cost requirements
Final implementation tested for
feasibility
Also serves as blueprint/prototype
for mass manufacturing of final
product

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

State
machine

Sequent.
program

Dataflow

Pascal

C/C++

Java

Implementation A Implementation
B

Concurrent
processes

VHDL

Implementation
C

The choice of
computational
model(s) is based
on whether it
allows the designer
to describe the
system.

The choice of
language(s) is
based on whether
it captures the
computational
model(s) used by
the designer.

The choice of
implementation is
based on whether it
meets power, size,
performance and
cost requirements.

42

Can use single and/or general-purpose processors


(a) Multiple processors, each executing one process

True multitasking (parallel processing)


General-purpose processors

(a)

Process3

Use programming language like C and compile to


instructions of processor
Expensive and in most cases not necessary

Process4

Process2

Process3

(b)

Processor D

General Purpose
Processor

Process4

Most processes dont use 100% of processor time


Can share processor time and still achieve necessary
execution rates

(c) Combination of (a) and (b)

Processor C

Process1

More common

(b) One general-purpose processor running all


processes

Processor B

Process2

Custom single-purpose processors

Processor A
Process1

Processor A
Process1
Process2
(c)

Multiple processes run on one general-purpose


processor while one or more processes run on own
single_purpose processor

Process3
Process4

General
Purpose
Processor

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Communication Bus

Communication Bus

Concurrent process model:


implementation

43

Implementation:
multiple processes sharing single processor

Can manually rewrite processes as a single sequential program

Ok for simple examples, but extremely difficult for complex examples


Automated techniques have evolved but not common
E.g., simple Hello World concurrent program from before would look like:
I = 1; T = 0;
while (1) {
Delay(I); T = T + 1;
if X modulo T is 0 then call PrintHelloWorld
if Y modulo T is 0 then call PrintHowAreYou
}

Can use multitasking operating system

Much more common


Operating system schedules processes, allocates storage, and interfaces to peripherals, etc.
Real-time operating system (RTOS) can guarantee execution rate constraints are met
Describe concurrent processes with languages having built-in processes (Java, Ada, etc.) or a sequential
programming language with library support for concurrent processes (C, C++, etc. using POSIX threads
for example)

Can convert processes to sequential program with process scheduling right in code

Less overhead (no operating system)


More complex/harder to maintain

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

44

Processes vs. threads


Different meanings when operating system terminology
Regular processes
Heavyweight process
Own virtual address space (stack, data, code)
System resources (e.g., open files)

Threads

Lightweight process
Subprocess within process
Only program counter, stack, and registers
Shares address space, system resources with other threads
Allows quicker communication between threads

Small compared to heavyweight processes


Can be created quickly
Low cost switching between threads

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

45

Implementation:
suspending, resuming, and joining
Multiple processes mapped to single-purpose processors
Built into processors implementation
Could be extra input signal that is asserted when process suspended
Additional logic needed for determining process completion
Extra output signals indicating process done

Multiple processes mapped to single general-purpose processor


Built into programming language or special multitasking library like POSIX
Language or library may rely on operating system to handle

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

46

Implementation: process scheduling


Must meet timing requirements when multiple concurrent processes
implemented on single general-purpose processor
Not true multitasking

Scheduler
Special process that decides when and for how long each process is executed
Implemented as preemptive or nonpreemptive scheduler
Preemptive
Determines how long a process executes before preempting to allow another process
to execute
Time quantum: predetermined amount of execution time preemptive scheduler allows each
process (may be 10 to 100s of milliseconds long)

Determines which process will be next to run

Nonpreemptive
Only determines which process is next after current process finishes execution
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

47

Scheduling: priority
Process with highest priority always selected first by scheduler
Typically determined statically during creation and dynamically during
execution

FIFO
Runnable processes added to end of FIFO as created or become runnable
Front process removed from FIFO when time quantum of current process is up
or process is blocked

Priority queue
Runnable processes again added as created or become runnable
Process with highest priority chosen when new process needed
If multiple processes with same highest priority value then selects from them
using first-come first-served
Called priority scheduling when nonpreemptive
Called round-robin when preemptive

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

48

Priority assignment

Period of process

Repeating time interval the process must complete one execution within

Usually determined by the description of the system

Amount of time process must be completed by after it has started

E.g., execution time = 5 ms, deadline = 20 ms, period = 100 ms


Process must complete execution within 20 ms after it has begun regardless of its period
Process begins at start of period, runs for 4 ms then is preempted
Process suspended for 14 ms, then runs for the remaining 1 ms
Completed within 4 + 14 + 1 = 19 ms which meets deadline of 20 ms
Without deadline process could be suspended for much longer

Rate monotonic scheduling

E.g., refresh rate of display is 27 times/sec


Period = 37 ms

Execution deadline

E.g., period = 100 ms


Process must execute once every 100 ms

Processes with shorter periods have higher priority


Typically used when execution deadline = period

Rate monotonic
Process

Period

Priority

A
B
C
D
E
F

25 ms
50 ms
12 ms
100 ms
40 ms
75 ms

5
3
6
1
4
2

Deadline monotonic
Process

Deadline

Priority

G
H
I
J
K
L

17 ms
50 ms
32 ms
10 ms
140 ms
32 ms

5
2
3
6
1
4

Deadline monotonic scheduling

Processes with shorter deadlines have higher priority


Typically used when execution deadline < period

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

49

Real-time systems
Systems composed of 2 or more cooperating, concurrent processes with
stringent execution time constraints
E.g., set-top boxes have separate processes that read or decode video and/or
sound concurrently and must decode 20 frames/sec for output to appear
continuous
Other examples with stringent time constraints are:

digital cell phones


navigation and process control systems
assembly line monitoring systems
multimedia and networking systems
etc.

Communication and synchronization between processes for these systems is


critical
Therefore, concurrent process model best suited for describing these systems
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

50

Real-time operating systems (RTOS)

Provide mechanisms, primitives, and guidelines for building real-time embedded systems
Windows CE

Built specifically for embedded systems and appliance market


Scalable real-time 32-bit platform
Supports Windows API
Perfect for systems designed to interface with Internet
Preemptive priority scheduling with 256 priority levels per process
Kernel is 400 Kbytes

QNX

Real-time microkernel surrounded by optional processes (resource managers) that provide POSIX and
UNIX compatibility

Microkernels typically support only the most basic services


Optional resource managers allow scalability from small ROM-based systems to huge multiprocessor systems
connected by various networking and communication technologies

Preemptive process scheduling using FIFO, round-robin, adaptive, or priority-driven scheduling


32 priority levels per process
Microkernel < 10 Kbytes and complies with POSIX real-time standard

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

51

Summary
Computation models are distinct from languages
Sequential program model is popular
Most common languages like C support it directly

State machine models good for control


Extensions like HCFSM provide additional power
PSM combines state machines and sequential programs

Concurrent process model for multi-task systems


Communication and synchronization methods exist
Scheduling is critical

Dataflow model good for signal processing


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

52

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 2: Custom single-purpose


processors

Outline

Introduction
Combinational logic
Sequential logic
Custom single-purpose processor design
RT-level custom single-purpose processor design

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introduction
Processor
Digital circuit that performs a
computation tasks
Controller and datapath
CCD
General-purpose: variety of computation
tasks
Single-purpose: one particular
lens
computation task
Custom single-purpose: non-standard
task

Digital camera chip


CCD
preprocessor

A2D

JPEG codec

Pixel coprocessor

Microcontroller

Multiplier/Accum

DMA controller

Display
ctrl

A custom single-purpose
processor may be
Fast, small, low power
But, high NRE, longer time-to-market,
less flexible

D2A

Memory controller

ISA bus interface

UART

LCD ctrl

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

CMOS transistor on silicon


Transistor
The basic electrical component in digital systems
Acts as an on/off switch
Voltage at gate controls whether current flows from
source to drain
Dont confuse this gate with a logic gate
gate
1

IC package

IC

source

gate
oxide
channel

drain
Conducts
if gate at 1
source

drain
Silicon substrate

nMOS transistor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

CMOS transistor implementations


Complementary Metal Oxide
Semiconductor
We refer to logic levels

source

drain
gate

Conducts
if gate at 1
source

gate

Conducts
if gate at 0
drain

pMOS

nMOS

Typically 0 : 0V, 1 : 5V or less

Two basic CMOS types


nMOS conducts if gate at 1
pMOS conducts if gate at 0
Hence complementary

x
x

F = x'

F = (xy)'

x
y

Basic gates

F = (x+y)'
x
0

Inverter, NAND, NOR

0
NOR gate

NAND gate

inverter

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Basic logic gates


x

x
0
1

F
0
1

F = x
Inverter

F=xy
AND

F=x
Driver

x
0
1

F
1
0

x
y

F = (x y)
NAND

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

x
0
0
1
1

y
0
1
0
1

F
0
0
0
1

x
y

x
0
0
1
1

y
0
1
0
1

F
1
1
1
0

x
y

F=x+y
OR

F = (x+y)
NOR

x
0
0
1
1

y
0
1
0
1

F
0
1
1
1

x
0
0
1
1

y
0
1
0
1

F
1
0
0
0

F=xy
XOR

F = (x y)
XNOR

x
0
0
1
1

y
0
1
0
1

F
0
1
1
0

x
0
0
1
1

y
0
1
0
1

F
1
0
0
1

Combinational logic design


A) Problem description

B) Truth table

y is 1 if a is to 1, or b and c are 1. z is 1 if
b or c is to 1, but not both, or if all are 1.

D) Minimized output equations


y bc
00 01 11 10
a
0 0
0
1
0
1

a
0
0
0
0
1
1
1
1

C) Output equations

Outputs
y
z
0
0
0
1
0
1
1
0
1
0
1
1
1
1
1
1

Inputs
b
c
0
0
0
1
1
0
1
1
0
0
0
1
1
0
1
1

y = a'bc + ab'c' + ab'c + abc' + abc

z = a'b'c + a'bc' + ab'c + abc' + abc

E) Logic Gates
(random logic)
a
b
c

y = a + bc
z

bc
0

00
0

01
1

11
0

10
1

z = ab + bc + bc

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Combinational components
I(m-1) I1 I0
n

S0
n-bit, m x 1
Multiplexor
S(log m) n
O

Multiplexor
O=
I0 if S=0..00
I1 if S=0..01

I(m-1) if S=1..11

I(log n -1) I0

B
n

A
n

log n x n
Decoder

n-bit
Adder

O(n-1) O1 O0

carry sum

less equal greater

Decoder

Adder

Comparator

sum = A+B
(first n bits)
carry = (n+1)th
bit of A+B

With enable input e


all Os are 0 if e=0

With carry-in input Ci

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

n-bit
Comparator

O0 =1 if I=0..00
O1 =1 if I=0..01

O(n-1) =1 if I=1..11

sum = A + B + Ci

less = 1 if A<B
equal =1 if A=B
greater=1 if A>B

B
n

n bit,
m function S0
ALU

S(log m)
n
O

ALU
O = A op B
op determined
by S.

May have status outputs


carry, zero, etc.

Sequential components
I
n
load

shift

n-bit
Register

clear

n-bit
Shift register

n-bit
Counter
n

Q
Shift register

(storage) Register

Counter

Q = lsb
- Content shifted
- I stored in msb

Q=
0 if clear=1,
I if load=1 and clock=1,
Q(previous) otherwise.

Q=
0 if clear=1,
Q(prev)+1 if count=1 and clock=1.

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Sequential logic design


A) Problem Description

C) Implementation Model

You want to construct a clock


divider. Slow down your preexisting clock so that you output a
1 for every four clock cycles

Combinational logic

I0

B) State Diagram
a=0

a=1

1
x=0

a=0

I1

I0

Q1
0
0
0
0
1
1
1
1

Inputs
Q0
a
0
0
0
1
1
0
1
1
0
0
0
1
1
0
1
1

I1
0
0
0
1
1
1
1
0

Outputs
I0
0
1
1
0
0
1
1
0

x
0
0
0
1

a=1

a=0

Q0
State register

x=1

x=0

x
I1

Q1

D) State Table (Moore-type)

a=1

a=1

2
x=0

a=0

Given this implementation model


Sequential logic design quickly reduces to
combinational logic design

Embedded Systems Design: A Unified


gis
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Sequential logic design (cont.)


F) Combinational Logic

E) Minimized Output Equations


I1 Q1Q0
00
a

01

11

10

01

11

10

I0 Q1Q0
00
a

01

11

10

x Q1Q0
00
a

(random logic)
a
x

I1 = Q1Q0a + Q1a +
Q1Q0

I1
I0 = Q0a + Q0a

I0

x = Q1Q0
Q1 Q0

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

11

Custom single-purpose processor basic


model

external
control
inputs

external
data
inputs

controller

datapath
control
inputs

datapath
control
outputs

external
control
outputs

datapath

controller

datapath

next-state
and
control
logic

registers

state
register

functional
units

external
data
outputs

controller and datapath

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

a view inside the controller and datapath

12

Example: greatest common divisor


!1

(a) black-box
view

First create algorithm


Convert algorithm to
complex state machine

(c) state diagram

1:
1

!(!go_i)

2:

go_i

x_i

y_i

!go_i
2-J:

GCD

Known as FSMD: finitestate machine with datapath


Can use templates to
perform such conversion

3:

x = x_i

4:

y = y_i

d_o
(b) alg. specification
!(x!=y)

5:

0: int x, y;
1: while (1) {
2: while (!go_i) ;
3: x = x_i;
4:
y = y_i;
5: while (x != y) {
6:
if (x < y)
7:
y = y - x;
else
8:
x = x - y;
}
9:
d_o = x;
}

x!=y
6:
x<y
7:

y = y -x

!(x<y)

8: x = x - y

6-J:

5-J:
9:

d_o = x

1-J:

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

13

State diagram templates


Assignment statement

Loop statement
while (cond) {
loop-bodystatements
}
next statement

a=b
next statement

a=b

Branch statement

!cond

C:

if (c1)
c1 stmts
else if c2
c2 stmts
else
other stmts
next statement
C:
c1

cond
loop-bodystatements

next
statement

c2 stmts

!c1*!c2
others

J:

J:
next
statement

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

c1 stmts

!c1*c2

next
statement

14

Creating the datapath


Create a register for any
declared variable
Create a functional unit for
each arithmetic operation
Connect the ports, registers
and functional units

!1
1:
1

!(!go_i)

2:

x_i

!go_i

Datapath

2-J:
x_sel
3:

x = x_i

4:

y = y_i

x_ld

n-bit 2x1

0: x

0: y

y_ld
!(x!=y)

5:

!=
5: x!=y
x_neq_y

6:
x<y
y = y -x

7:

n-bit 2x1

y_sel

x!=y

Based on reads and writes


Use multiplexors for
multiple sources

y_i

!(x<y)

<

subtractor

6: x<y

subtractor

8: x-y

x_lt_y

8: x = x - y

9: d

d_ld
d_o

6-J:

Create unique identifier

7: y-x

5-J:

for each datapath component


control input and output

9:

d_o = x

1-J:

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

Creating the controllers FSM


go_i

!1
1:

Controller
1

!(!go_i)

0000

1:

0001

2:

!1
1

2:
!go_i

!(!go_i)

!go_i

2-J:

0010 2-J:

3:

x = x_i

4:

y = y_i

0011

x_sel = 0
3: x_ld = 1

0100

y_sel = 0
4: y_ld = 1

0101

5:

!(x!=y)

5:

x_i

0110
x<y

7:

y = y -x

!(x<y)

8: x = x - y

x_neq_y

6:

!x_lt_y
8: x_sel = 1
x_ld = 1

0111

6-J:

9:
1-J:

d_o = x

!=

x_lt_y
1011

9:

d_ld = 1

1100 1-J:

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

n-bit 2x1

0: x

0: y

y_ld

5: x!=y
x_neq_y

1010 5-J:

n-bit 2x1

y_sel

1000

1001 6-J:
5-J:

x_sel

x_ld
x_lt_y
7: y_sel = 1
y_ld = 1

y_i

Datapath
!x_neq_y

x!=y
6:

Same structure as FSMD


Replace complex
actions/conditions with
datapath configurations

<
6: x<y

subtractor
8: x-y

subtractor
7: y-x

9: d

d_ld
d_o

16

Splitting into a controller and datapath


go_i

Controller

Controller implementation model

0000

go_i

!1
x_i

1:
1

x_sel

Combinational
logic

0001

y_sel

(b) Datapath

2:
x_sel

!go_i

x_ld
0010 2-J:

y_ld
x_neq_y

0011

x_lt_y
d_ld
0100

x_ld

x_sel = 0
3: x_ld = 1

5:

0110

6:

I1

5: x!=y
x_neq_y

x_neq_y=1

x_lt_y=1
7: y_sel = 1
y_ld = 1

I0

0: x

0: y

!=
x_neq_y=0

subtractor
8: x-y

subtractor
7: y-x

9: d

d_ld

x_lt_y=0
8: x_sel = 1
x_ld = 1

0111

<
6: x<y

x_lt_y

State register
I2

n-bit 2x1

y_ld

y_sel = 0
4: y_ld = 1

0101

n-bit 2x1

y_sel

Q3 Q2 Q1 Q0

I3

y_i

!(!go_i)

d_o

1000

1001 6-J:
1010 5-J:
1011

9:

d_ld = 1

1100 1-J:

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Controller state table for the GCD example


Inputs
Q3

Q2

Q1

Q0

Outputs
x_lt_
y
*

go_i

I3

I2

I1

I0

x_sel

y_sel

x_ld

y_ld

d_ld

x_neq
_y
*

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

18

Completing the GCD custom single-purpose


processor design
We finished the datapath
We have a state table for
the next state and control
logic

controller

datapath

next-state
and
control
logic

registers

state
register

functional
units

All thats left is


combinational logic
design

This is not an optimized


design, but we see the
basic steps

a view inside the controller and datapath

Embedded Systems Design: A Unified


ard
Hardware/Software
Introduction, (c) 2000 Vahid/Givargis

19

We often start with a state


machine
Rather than algorithm
Cycle timing often too central
to functionality

Problem Specification

RT-level custom single-purpose processor


design
Sende
r

clock
data_in(4)

Example

Embedded Systems Design: A Unified


H
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Bridge
A single-purpose processor that
converts two 4-bit inputs, arriving one
at a time over data_in along with a
rdy_in pulse, into one 8-bit output on
data_out along with a rdy_out pulse.

rdy_in=0

rdy_out

Rece
iver

data_out(8)

Bridge

rdy_in=1

RecFirst4Start
data_lo=data_in

RecFirst4End

rdy_in=1
WaitFirst4

rdy_in=0

FSMD

Bus bridge that converts 4-bit


bus to 8-bit bus
Start with FSMD
Known as register-transfer
(RT) level
Exercise: complete the design

rdy_in

WaitSecond4

rdy_in=0
rdy_in=1
RecSecond4Start
data_hi=data_in
rdy_in=0

Send8Start
data_out=data_hi
& data_lo
rdy_out=1

Send8End
rdy_out=0

rdy_in=1
RecSecond4End

Inputs
rdy_in: bit; data_in: bit[4];
Outputs
rdy_out: bit; data_out:bit[8]
Variables
data_lo, data_hi: bit[4];

20

RT-level custom single-purpose processor


design (cont)
Bridge

(a) Controller
rdy_in=0
WaitFirst4
rdy_in=0
WaitSecond4

Send8Start
data_out_ld=1
rdy_out=1

rdy_in=1
rdy_in=1
RecFirst4Start
data_lo_ld=1
rdy_in=0
rdy_in=1
RecSecond4Start
data_hi_ld=1

RecFirst4End

rdy_in=1
RecSecond4End

Send8End
rdy_out=0

rdy_in

rdy_out

clk
data_out

data_hi

data_lo

data_lo_ld

data_out_ld
data_hi_ld

to all
registers

data_in(4)

data_out

(b) Datapath

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

Optimizing custom single-purpose processors


Optimization is the task of making design metric
values the best possible
Optimization opportunities

original program
FSMD
datapath
FSM

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

22

Optimizing the original program


Analyze program attributes and look for areas of
possible improvement

number of computations
size of variable
time and space complexity
operations used
multiplication and division very expensive

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

Optimizing the original program (cont)


original program
0: int x, y;
1: while (1) {
2:
while (!go_i) ;
3:
x = x_i;
4:
y = y_i;
5:
while (x != y) {
6:
if (x < y)
7:
y = y - x;
else
8:
x = x - y;
}
9:
d_o = x;
}

replace the subtraction


operation(s) with modulo
operation in order to speed
up program

optimized program
0: int x, y, r;
1: while (1) {
2:
while (!go_i) ;
// x must be the larger number
3:
if (x_i >= y_i) {
4:
x=x_i;
5:
y=y_i;
}
6:
else {
7:
x=y_i;
8:
y=x_i;
}
9:
while (y != 0) {
10:
r = x % y;
11:
x = y;
12:
y = r;
}
13:
d_o = x;
}

GCD(42, 8) - 9 iterations to complete the loop

GCD(42,8) - 3 iterations to complete the loop

x and y values evaluated as follows : (42, 8), (43, 8),


(26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).

x and y values evaluated as follows: (42, 8), (8,2),


(2,0)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

24

Optimizing the FSMD


Areas of possible improvements
merge states
states with constants on transitions can be eliminated, transition
taken is already known
states with independent operations can be merged

separate states
states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size

scheduling

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

25

Optimizing the FSMD (cont.)


int x, y;

!1

1:

original FSMD

optimized FSMD
int x, y;

!(!go_i)

2:

eliminate state 1 transitions have constant values

2:
go_i

!go_i
2-J:

3:

3:

merge state 2 and state 2J no loop operation in


between them

x = x_i

!go_i

x = x_i
y = y_i

5:
4:

y = y_i
!(x!=y)

5:

merge state 3 and state 4 assignment operations are


independent of one another

x!=y
6:
x<y
7:

y = y -x

!(x<y)

merge state 5 and state 6 transitions from state 6 can


be done in state 5

x<y
7: y = y -x

9:

x>y
8: x = x - y

d_o = x

8: x = x - y

eliminate state 5J and 6J transitions from each state


can be done from state 7 and state 8, respectively

6-J:
5-J:

9:

d_o = x

eliminate state 1-J transition from state 1-J can be


done directly from state 9

1-J:

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

26

Optimizing the datapath


Sharing of functional units
one-to-one mapping, as done previously, is not necessary
if same operation occurs in different states, they can share a
single functional unit

Multi-functional units
ALUs support a variety of operations, it can be shared
among operations occurring in different states

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

Optimizing the FSM


State encoding
task of assigning a unique bit pattern to each state in an FSM
size of state register and combinational logic vary
can be treated as an ordering problem

State minimization
task of merging equivalent states into a single state
state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

28

Summary
Custom single-purpose processors

Straightforward design techniques


Can be built to execute algorithms
Typically start with FSMD
CAD tools can be of great assistance

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

29

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 3 Instruction-Set Processors:


Software

Introduction
Instruction-Set Processor
Processor designed for a variety of computation tasks
General-Purpose Processor (GPP)
Application-Specific Processor (ASIP): optimized for a specific subset of tasks

Low unit cost because NRE is spreaded over large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996 alone

Carefully designed since higher NRE is acceptable


Can yield good performance, size and power

System implementations designed with low NRE cost, short time-tomarket/prototype, high flexibility
User just writes software; no processor design

Terms microprocessor, microcontroller or micro adopted when they were finally


implemented on one or few chips

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Basic Architecture
Control unit and
datapath

Processor
Control unit

Note similarity to
single-purpose
processor

Datapath
ALU

Controller

Control
/Status
Registers

Key differences
Datapath is general
Control unit doesnt
store the algorithm
the algorithm is
programmed into the
memory
Embedded Systems Design: A Unified
E
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

PC

IR

I/O
Memory

Datapath Operations
Load

Processor

Read memory location


into register

Control unit

Datapath
ALU

ALU operation

Controller

+1

Control
/Status

Input certain registers


through ALU, store
back in register

Registers

Store

10

Write register to
memory location

PC

11

IR

I/O

...

Memory

10
11

...
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Control Unit

Control unit: configures the datapath


operations

Processor

Sequence of desired operations


(instructions) stored in memory
program

Control unit

ALU
Controller

Instruction cycle broken into


several sub-operations, each one
clock cycle, e.g.:
Fetch: Get next instruction into IR
Decode: Determine what the
instruction means
Fetch operands: Move data from
memory to datapath register
Execute: Move data through the
ALU
Store results: Write data from
register to memory

Datapath

Control
/Status
Registers

PC

IR

I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

R0

Memory

R1

...
500
501

10

...
5

Control Unit Sub-Operations


Fetch

Processor

Get next instruction


into IR
PC: program
counter, always
points to next
instruction
IR: holds the
fetched instruction

Control unit

Datapath
ALU

Controller

Control
/Status
Registers

PC

100

IR
load R0, M[500]

R0

I/O
100 load R0, M[500]

Memory

...
500
501

101
inc R1, R0
102 store M[501], R1

R1

10

...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Control Unit Sub-Operations


Decode

Processor
Control unit

Determine what the


instruction means

Datapath
ALU

Controller

Control
/Status
Registers

PC

100

IR
load R0, M[500]

R0

I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Memory

R1

...
500
501

10

...
7

Control Unit Sub-Operations


Fetch operands

Processor
Control unit

Move data from


memory to datapath
register

Datapath
ALU

Controller

Control
/Status
Registers

10
PC

100

IR
load R0, M[500]

R0

I/O
100 load R0, M[500]

Memory

...
500
501

101
inc R1, R0
102 store M[501], R1

R1

10

...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Control Unit Sub-Operations


Execute
Move data through
the ALU
This particular
instruction does
nothing during this
sub-operation

Processor
Control unit

Datapath
ALU

Controller

Control
/Status
Registers

10
PC

100

IR
load R0, M[500]

R0

I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Memory

R1

...
500
501

10

...
9

Control Unit Sub-Operations


Store results

Processor

Write data from


register to memory
This particular
instruction does
nothing during this
sub-operation

Control unit

Datapath
ALU

Controller

Control
/Status
Registers

10
PC

IR
load R0, M[500]

100

R0

I/O
Memory

100 load R0, M[500]

...
500
501

101
inc R1, R0
102 store M[501], R1

R1

10

...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Instruction Cycles
PC=100

Fetch Decode Fetch Exec. Store


ops
results

clk

Processor
Control unit

Datapath
ALU

Controller

Control
/Status
Registers

10
PC 100

IR
load R0, M[500]

R0

I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Memory

R1

...
500
501

10

...
11

Instruction Cycles
PC=100

Processor

Fetch Decode Fetch Exec. Store


ops
results

clk

Control unit

Datapath
ALU

Controller

+1

Control
/Status

PC=101

Registers

Fetch Decode Fetch Exec. Store


ops
results

clk

10
PC 101

IR
inc R1, R0

R0

I/O
100 load R0, M[500]

Memory

101
inc R1, R0
102 store M[501], R1

11
R1

...
500
501

10

...

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

Instruction Cycles
PC=100

Fetch Decode Fetch Exec. Store


ops
results

clk

Processor
Control unit

Datapath
ALU

Controller

Control
/Status

PC=101

Registers

Fetch Decode Fetch Exec. Store


ops
results

clk

10
PC 102

IR
store M[501], R1

R0

11
R1

PC=102

Fetch Decode Fetch Exec. Store


ops
results

clk

I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Memory

...
500 10
501 11

...
13

Architectural Considerations
N-bit processor
N-bit ALU, registers,
buses, memory data
interface
Embedded: 8-bit, 16bit, 32-bit common
Desktop/servers: 32bit, even 64

Processor
Control unit

Datapath
ALU

Controller

Control
/Status
Registers

PC

IR

PC size determines
address space

I/O
Memory

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

14

Architectural Considerations
Clock frequency
Inverse of clock
period
Must be longer than
longest register to
register delay in
entire processor
Memory access is
often the longest

Processor
Control unit

Datapath
ALU

Controller

Control
/Status
Registers

PC

IR

I/O
Memory

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

Pipelining: Increasing Instruction


Throughput
Wash

Non-pipelined
Dry

Decode

Time

Instruction 1

pipelined dish cleaning

Execute
Store res.

Fetch ops.

Pipelined
2

non-pipelined dish cleaning

Fetch-instr.

pipelined instruction execution

Time

Pipelined

Time

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

Superscalar and VLIW Architectures


Performance can be improved by:
Faster clock (but theres a limit)
Pipelining: slice up instruction into stages, overlap stages
Multiple ALUs to support more than one instruction stream
Superscalar
Scalar: non-vector operations
Fetches instructions in batches, executes as many as possible
May require extensive hardware to detect independent instructions
VLIW: each word in memory has multiple independent instructions
Relies on the compiler to detect and schedule instructions
Currently growing in popularity

Embedded Systems Design: A Unified


Hardware/Software
Introduction, (c) 2000 Vahid/Givargis
ard

17

Two Memory Architectures

Princeton (Von Neumann)


Fewer memory wires
Harvard
Simultaneous program
and data memory
access
(microcontrollers)

Processor

Program
memory

Data memory

Harvard

Processor

Memory
(program and data)

Princeton

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

18

Cache Memory
Memory access may be slow
Cache is small but fast
memory close to processor
Holds copy of part of memory
Hits and misses

Fast/expensive technology, usually on


the same chip
Processor

Program Cache

Data Cache

Memory

Slower/cheaper technology, usually on


a different chip

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

19

Programmers View
Programmer doesnt need detailed understanding of architecture
Instead, needs to know what instructions can be executed

Two levels of instructions:


Assembly level
Structured languages (C, C++, Java, etc.)

Most development today done using structured languages


But, some assembly level programming may still be necessary
Drivers: portion of program that communicates with and/or controls
(drives) another device
Often have detailed timing considerations, extensive bit manipulation
Assembly level may be best for these

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

20

Assembly-Level Instructions
Instruction 1

opcode

operand1

operand2

Instruction 2

opcode

operand1

operand2

Instruction 3

opcode

operand1

operand2

Instruction 4

opcode

operand1

operand2

...

Instruction Set
Defines the legal set of instructions for that processor
Data transfer: memory/register, register/register, I/O, etc.
Arithmetic/logical: move register through ALU and back
Branches: determine next PC value when not just PC+1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

A Simple (Trivial) Instruction Set


Assembly instruct.

First byte

Second byte

Operation

MOV Rn, direct

0000

Rn

direct

Rn = M(direct)

MOV direct, Rn

0001

Rn

direct

M(direct) = Rn

MOV @Rn, Rm

0010

Rn

MOV Rn, #immed.

0011

Rn

ADD Rn, Rm

0100

Rn

Rm

Rn = Rn + Rm

SUB Rn, Rm

0101

Rn

Rm

Rn = Rn - Rm

JZ Rn, relative

0110

Rn

opcode

Rm

M(Rn) = Rm

immediate

relative

Rn = immediate

PC = PC+ relative
(only if Rn is 0)

operands

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

22

Addressing Modes
Addressing
mode

Operand field

Immediate

Data

Register-direct

Register-file
contents

Memory
contents

Register address

Data

Register
indirect

Register address

Memory address

Direct

Memory address

Data

Indirect

Memory address

Memory address

Data

Data

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

Sample Programs
Equivalent assembly program

C program
int total = 0;
for (int i=10; i!=0; i--)
total += i;
// next instructions...

0
1
2
3

MOV R0, #0;


MOV R1, #10;
MOV R2, #1;
MOV R3, #0;

// total = 0
// i = 10
// constant 1
// constant 0

Loop:
5
6
7

JZ R1, Next;
ADD R0, R1;
SUB R1, R2;
JZ R3, Loop;

// Done if i=0
// total += i
// i-// Jump always

Next:

// next instructions...

Try some others


Handshake: Wait until the value of M[254] is not 0, set M[255] to 1, wait
until M[254] is 0, set M[255] to 0 (assume those locations are ports).
(Harder) Count the occurrences of zero in an array stored in memory
locations 100 through 199.
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

24

Programmer Considerations
Program and data memory space
Embedded processors often very limited
e.g., 64 Kbytes program, 256 bytes of RAM (expandable)

Registers: How many are there?


Only a direct concern for assembly-level programmers

I/O
How communicate with external signals?

Interrupts

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

25

Microprocessor Architecture Overview


If you are using a particular microprocessor, now is a
good time to review its architecture

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

26

Example: parallel port driver


LPT Connection Pin

I/O Direction

Register Address

Output

0th bit of register #2


0th

7th

2-9

Output

bit of register #0

10,11,12,13,15

Input

6,7,5,4,3th bit of register #1

14,16,17

Output

1,2,3th bit of register #2

Pin 13

PC

Switch

Parallel port
Pin 2

LED

Using assembly language programming we can configure a PC


parallel port to perform digital I/O (8255A peripheral I/F controller chip)
write and read to three special registers to accomplish this table provides
list of parallel port connector pins and corresponding register location
Example : parallel port which monitors the input switch and turns the LED
on/off accordingly

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

Parallel Port Example


;
;
;
;
;

This program consists of a sub-routine that reads


the state of the input pin, determining the on/off state
of our switch and asserts the output pin, turning the LED
on/off accordingly
x86 assembly language

CheckPort
push
push
mov
in
and
cmp
jne

proc
ax
; save the content
dx
; save the content
dx, 3BCh + 1 ; base + 1 for register #1
al, dx
; read register #1
al, 10h
; mask out all but bit # 4
al, 0
; is it 0?
SwitchOn
; if not, we need to turn the LED on

SwitchOff:
mov
in
and
out
jmp

dx, 3BCh + 0 ; base + 0 for register #0


al, dx
; read the current state of the port
al, f7h
; clear first bit (masking)
dx, al
; write it out to the port
Done
; we are done

SwitchOn:
mov
in
or
out

dx,
al,
al,
dx,

Done:

pop
pop
CheckPort

dx
ax
endp

extern C CheckPort(void);

// defined in
// assembly

void main(void) {
while( 1 ) {
CheckPort();
}
}

Pin 13

PC

Switch

Parallel port
Pin 2

LED

LPT Connection Pin

I/O Direction

Register Address

Output

0th bit of register #2

3BCh + 0 ; base + 0 for register #0


dx
; read the current state of the port
01h
; set first bit (masking)
al
; write it out to the port

2-9

Output

0th-7th bit of register #0

10,11,12,13,15

Input

6,7,5,4,3th bit of reg. #1

14,16,17

Output

1,2,3th bit of register #2

; restore the content


; restore the content

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

28

Operating System
Optional software layer
providing low-level services to
a program (application).
File management, disk access
Keyboard/display interfacing
Scheduling multiple programs for
execution
Or even just multiple threads from
one program

Program makes system calls to


the OS
Embedded
mb
Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

DB file_name out.txt -- store file name


MOV
MOV
INT
JZ

R0, 1324
R1, file_name
34
R0, L1

-----

system call open id


address of file-name
cause a system call
if zero -> error

. . . read the file


JMP L2
-- bypass error cond.
L1:
. . . handle the error
L2:

29

Development Environment
Development processor
The processor on which we write and debug our programs
Usually a PC

Target processor
The processor that the program will run on in our embedded
system
Often different from the development processor

Development processor

Target processor

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

30

Software Development Process


Compilers
C File

C File

Compiler
Binary
File

Binary
File

Cross compiler

Asm.
File

Runs on one
processor, but
generates code for
another

Assemble
r
Binary
File

Linker
Library
Exec.
File
Implementation Phase

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Debugger
Profiler

Verification Phase

Assemblers
Linkers
Debuggers
Profilers
31

Running a Program
If development processor is different than target, how
can we run our compiled code? Two options:
Download to target processor
Simulate

Simulation
One method: Hardware description language
But slow, not always available

Another method: Instruction set simulator (ISS)


Runs on development processor, but executes instructions of target
processor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

32

Instruction Set Simulator For A Simple


Processor
#include <stdio.h>
typedef struct {
unsigned char first_byte, second_byte;
} instruction;

instruction program[1024];
unsigned char memory[256];

int main(int argc, char *argv[]) {

//instruction memory
//data memory

}
return 0;
}

FILE* ifs;
void run_program(int num_bytes) {
If( argc != 2 ||
(ifs = fopen(argv[1], rb) == NULL ) {
return 1;
}
if (run_program(fread(program, 2,
sizeof(program), ifs)) == 0) {
print_memory_contents();
return(0);
}
else return(-1);

int pc = -1;
unsigned char reg[16], fb, sb;
while( ++pc < (num_bytes / 2) ) {
fb = program[pc].first_byte;
sb = program[pc].second_byte;
switch( fb >> 4 ) {
case 0: reg[fb & 0x0f] = memory[sb]; break;
case 1: memory[sb] = reg[fb & 0x0f]; break;
case 2: memory[reg[fb & 0x0f]] =
reg[sb >> 4]; break;
case 3: reg[fb & 0x0f] = sb; break;
case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;
case 6: pc += sb; break;
default: return 1;

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

33

Testing and Debugging


(a)

ISS

(b)

Implementation
Phase

Verification
Phase

Implementation
Phase

Development processor

Debugger
/ ISS
Emulator

External tools

Gives us control over time


set breakpoints, look at
register values, set values,
step-by-step execution, ...
But, doesnt interact with real
environment

Download to board
Use device programmer
Runs in real environment, but
not controllable

Compromise: emulator
Programmer
Verification
Phase

Runs in real environment, at


speed or near
Supports some controllability
from the PC

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

34

Application-Specific Instruction-Set
Processors (ASIPs)
GPPs
Sometimes too general to be effective in demanding
application
e.g., video processing requires huge video buffers and operations
on large arrays of data, inefficient on a GPP

But single-purpose processor has high NRE, not


programmable

ASIPs targeted to a particular domain


Contain architectural features specific to that domain
e.g., embedded control, digital signal processing, video processing,
network processing, telecommunications, etc.

Still programmable
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

35

A Common ASIP: Microcontroller


For embedded control applications
Reading sensors, setting actuators
Mostly dealing with events (bits): data is present, but not in huge
amounts
e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven

Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register space

On-chip program and data memory


Direct programmer access to many of the chips pins
Specialized instructions for bit-manipulation and other low-level
operations
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

36

Another Common ASIP: Digital Signal


Processors (DSP)
For signal processing applications
Large amounts of digitized data, often streaming
Data transformations must be applied fast
e.g., cell-phone voice filter, digital TV, music synthesizer

DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

37

Trend: Even More Customized ASIPs


In the past, microprocessors were acquired as chips
Today, we increasingly acquire a processor as Intellectual
Property (IP)
e.g., synthesizable VHDL model

Opportunity to add a custom datapath hardware and a few


custom instructions, or delete a few instructions
Can have significant performance, power and size impacts
Problem: need compiler/debugger for customized ASIP
Remember, most development uses structured languages
One solution: automatic compiler/debugger generation
e.g., www.tensilica.com

Another solution: retargettable compilers


e.g., www.improvsys.com (customized VLIW architectures)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

38

Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.

Speed: how evaluate a processors speed?


Clock speed but instructions per cycle may differ
Instructions per second but work per instr. may differ
Dhrystone: Synthetic benchmark (1984). Standard code (mostly string handling; no
floating point operations). Dhrystones/sec.
MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digitals VAX 11/780).
A.k.a. Dhrystone MIPS. Commonly used today.
So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
SPEC: set of more realistic benchmarks, but oriented to desktops
EEMBC EDN Embedded Benchmark Consortium, www.eembc.org
Suites of benchmarks: automotive, consumer electronics, networking, office
automation, telecommunications

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

39

Instruction-Set Processors
Processor

Clock speed

Intel PIII

1GHz

IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110

550 MHz

250 MHz
233 MHz

Intel
8051
Motorola
68HC811

12 MHz

TI C5416

160 MHz

Lucent
DSP32C

80 MHz

3 MHz

Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None

4K ROM, 128 RAM,


32 I/O, Timer, UART
4K ROM, 192 RAM,
32 I/O, Timer, WDT,
SPI
128K, SRAM, 3 T1
Ports, DMA, 13
ADC, 9 DAC
16K Inst., 2K Data,
Serial Ports, DMA

Bus Width
MIPS
General Purpose Processors
32
~900

Power

Trans.

Price

97W

~7M

$900

32/64

~1300

5W

~7M

$900

32/64

NA

NA

3.6M

NA

32

268

1W

2.1M

NA

Microcontroller
~1

~0.2W

~10K

$7

~.5

~0.1W

~10K

$5

Digital Signal Processors


16/32
~600

NA

NA

$34

32

NA

NA

$75

40

Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

40

Designing an Instruction-Set Processor


FSMD

Declarations:

Not something an embedded


system designer normally
would do

bit PC[16],
// Program Counter
IR[16];
// Instruction Reg.
bit M[64k][16], // Memory
RF[16][16]; // Register File

Reset

PC=0;

Fetch

IR=M[PC];
PC=PC+1

Decode

But instructive to see how


simply we can build one top
down
Remember that real processors
arent usually built this way

from states
below

Mov1

RF[Rn] = M[dir]
to Fetch

Mov2

M[dir] = RF[Rn]
to Fetch

Mov3

M[@Rn] = RF[Rm]
to Fetch

Mov4

RF[Rn]= imm
to Fetch

Op = 0000

0001

0010

Much more optimized, much


more bottom-up design

0011

Add

RF[Rn] =RF[Rn]+RF[Rm]
to Fetch

Sub

RF[Rn] = RF[Rn]-RF[Rm]
to Fetch

0100
Aliases:
0101
Op IR[15..12]
Rn IR[11..8]
Rm IR[7..4]

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

dir IR[7..0]
imm IR[7..0]
rel IR[7..0]

Jz
0110

PC=(RF[Rn]=0) ?rel :PC


to Fetch

41

Architecture of a Simple Microprocessor


Storage devices for each
declared variable

Control unit

register file holds each of the


variables

Controller
(Next-state and
control
logic; state
register)

Functional units to carry out the


FSMD operations
One ALU carries out every
required operation

Connections added among the


components ports
corresponding to the operations
required by the FSM
Unique identifiers created for
every control signal

To all
input
contro
l
signals

Datapath

From all
output
control
signals

16
PCld
PCinc

Irld
PC

IR

RFs

2x1 mux

RFwa

RFw

RFwe

RF (16)
RFr1a
RFr1e
RFr2a
RFr1

RFr2e

RFr2

ALUs

PCclr

ALU
ALUz

Ms

4x1 mux

Mre Mwe

Memory

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

42

A Simple Microprocessor
Reset

PC=0;

PCclr=1;

Fetch

IR=M[PC];
PC=PC+1

MS=10;
Irld=1;
Mre=1;
PCinc=1;

Decode

from states
below
RF[Rn] = M[dir]
to Fetch

RFwa=Rn; RFwe=1; RFs=01;


Ms=01; Mre=1;

Mov2

M[dir] = RF[Rn]
to Fetch

RFr1a=Rn; RFr1e=1;
Ms=01; Mwe=1;

Mov3

M[@Rn] = RF[Rm]
to Fetch

RFr1a=Rn; RFr1e=1; Alus=11;


RFr2a=Rm;RFr2e=1;
Ms=11; Mwe=1;

Mov4

RF[Rn]= imm
to Fetch

Mov1
Op = 0000
0001
0010
0011
0100
0101
0110

Add
Sub
Jz
FSMD

RFwa=Rn; RFwe=1; RFs=10;

RF[Rn] =RF[Rn]+RF[Rm] RFwa=Rn; RFwe=1; RFs=00;


RFr1a=Rn; RFr1e=1;
to Fetch
RFr2a=Rm; RFr2e=1; ALUs=00
RF[Rn] = RF[Rn]-RF[rm] RFwa=Rn; RFwe=1; RFs=00;
RFr1a=Rn; RFr1e=1;
to Fetch
RFr2a=Rm; RFr2e=1; ALUs=01
PCld= ALUz;
PC=(RF[Rn]=0) ?rel :PC
RFrla=Rn;
to Fetch
RFrle=1;

FSM operations that replace the FSMD


operations after a datapath is created

Control unit

Controller
(Next-state and
control
logic; state
register)

To all
input
contro
l
signals
From all
output
control
signals

16
PCld
PCinc

Irld
PC

IR

Datapath
RFs

2x1 mux

RFwa

RFw

RFwe

RF (16)
RFr1a
RFr1e
RFr2a
RFr2e

RFr1

RFr2

ALUs

PCclr

ALU
ALUz

3
Ms

4x1 mux

0
Mre Mwe

Memory

You just built a simple microprocessor!


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

43

Chapter Summary
Instruction-Set processors
Good performance, low NRE, flexible

Controller, datapath, and memory


Structured languages prevail
But some assembly level programming still necessary

Many tools available


Including instruction-set simulators, and in-circuit emulators

ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs

Choosing among processors is an important step


Designing an instruction-set processor is conceptually the same
as designing a single-purpose processor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

44

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 4 Standard Single Purpose


Processors: Peripherals

Introduction
Single-purpose processors
Performs specific computation task
Custom single-purpose processors
Designed by us for a unique task

Standard single-purpose processors

Off-the-shelf -- pre-designed for a common task


a.k.a. peripherals
serial transmission
analog/digital conversions

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Timers, counters, watchdog timers


Timer: measures time intervals
To generate timed output events
e.g., hold traffic light green for 10 s

To measure input events


e.g., measure a cars speed

Basic timer
Clk

16-bit up
counter

Based on counting clock pulses

E.g., let Clk period be 10 ns (f = 100 MHz)


And we count 20,000 Clk pulses
Then 200 microseconds have passed
16-bit counter would count up to 65,535*10 ns =
655.35 microsec., resolution = 10 ns
Top: indicates top count reached, wrap-around

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16 Cnt

Top
Reset

Counters

Counter: like a timer, but counts


pulses on a general input signal
rather than clock

Timer/counter
Clk

e.g., count cars passing over a sensor


Can often configure device as either a
timer or counter

2x1
mux

16-bit up
counter

16 Cnt

Cnt_in

Top
Reset
Mode

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Other timer structures


Interval timer
Indicates when desired time
interval has passed
We set terminal count to
desired interval
Number of clock cycles
= Desired time interval /
Clock period

Cascaded counters
Prescaler
Divides clock
Increases range, decreases
resolution
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Ha
H

16/32-bit timer
Clk
Timer with a terminal
count

16-bit up
counter

16 Cnt1
Top1

Clk

16-bit up
counter

16 Cnt
16-bit up
counter

16

Reset

Cnt2
Top2

=
Top

Time with prescaler


Clk

Prescaler

Terminal count

16-bit up
counter

Mode

Example: Reaction Timer


reaction
button

indicator
light
LCD

/* main.c */
#define MS_INIT
63535
void main(void){
int count_milliseconds = 0;

time: 100 ms

configure timer mode


set Cnt to MS_INIT

Measure time between turning light on


and user pushing button
16-bit timer, clk period is 83.33 ns (12 MHz),
counter increments every 6 cycles
Resolution = 6*83.33 ns=0.5 microsec.
Range = 65535*0.5 microseconds = 32.77
milliseconds
Want program to count millisec., so initialize
counter to 65535 1000/0.5 = 63535

wait a random amount of time


turn on indicator light
start timer
while (user has not pushed reaction button){
if(Top) {
stop timer
set Cnt to MS_INIT
start timer
reset Top
count_milliseconds++;
}
}
turn light off
printf(time: %i ms, count_milliseconds);
}

Embedded Systems Design: A Unified


E
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Watchdog timer
Must reset timer every
X time unit, else timer
generates a signal
Common use: detect
failure, self-reset
Another use: timeouts
e.g., ATM machine
16-bit timer, 2
millisec. resolution
timereg value = 2*(2161)X = 131070X
For 2 min. timeout,
X = 120,000 microsec.;
so timereg = 11070

osc

prescaler

clk

(/12)
12 MHz

scalereg

overflow

(12 bits)
1 MHz

overflow

Timereg
(16 bits)

to system reset
or interrupt

1/(131070 ms)

1/(2ms)

checkreg

/* main.c */
main(){
wait until card inserted
call watchdog_reset_routine
while(transaction in progress){
if(button pressed){
perform corresponding action
call watchdog_reset_routine
}
/* if watchdog_reset_routine not called every
< 2 minutes, interrupt_service_routine is
called */
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

watchdog_reset_routine(){
/* checkreg is set so we can load value into
timereg. Zero is loaded into scalereg and
11070 is loaded into timereg */
checkreg = 1
scalereg = 0
timereg = 11070
}
void interrupt_service_routine(){
eject card
reset screen
}

Serial Transmission Using UARTs


UART: Universal
Asynchronous Receiver
Transmitter
Takes parallel data and
transmits serially
Receives serial data and
converts to parallel

Parity: extra bit for simple


error checking
Start bit, stop bit
Baud rate

embedded
device
1

10011011

10011011

Sending UART
start bit

Receiving UART
end bit
data

signal changes per second


bit rate usually higher

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
H

Pulse width modulator


Generates pulses with specific
high/low times
Duty cycle: % time high

pwm_o
clk

Square wave: 50% duty cycle

Common use: control average


voltage to electric device
Simpler than DC-DC
converter or digital-analog
converter
DC motor speed, dimmer
lights

Another use: encode


commands, receiver uses timer
to decode
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
H

25% duty cycle average pwm_o is 1.25V

pwm_o
clk
50% duty cycle average pwm_o is 2.5V.

pwm_o
clk
75% duty cycle average pwm_o is 3.75V.

Controlling a DC motor with a PWM


counter
( 0 254)

clk_div

clk

controls how
fast the
counter
increments

8-bit
comparator

Input Voltage

% of Maximum
Voltage Applied

RPM of DC Motor

2.5

50

1840

3.75

75

6900

5.0

100

9200

counter <
cycle_high,
pwm_o = 1
counter >=
cycle_high,
pwm_o = 0

pwm_o

cycle_high

Relationship between applied voltage and speed of


the DC Motor

Internal Structure of PWM

void main(void){
/* controls period */
PWMP = 0xff;
/* controls duty cycle */
PWM1 = 0x7f;

The PWM alone cannot drive the


DC motor, a possible way to
implement a driver is shown
below using an MJE3055T NPN
transistor.

5V

DC

From
processor

5V

MOTOR

while(1){};
}

A
B

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

LCD controller
E
R/W
RS

void WriteChar(char c){

communications
bus

RS = 1;
DATA_BUS = c;
EnableLCD(45);

DB7DB0

8
microcontroller

/* indicate data being sent */


/* send data to LCD */
/* toggle the LCD with appropriate delay */

LCD
controller

CODES
I/D = 1 cursor moves left

DL = 1 8-bit

I/D = 0 cursor moves right

DL = 0 4-bit

S = 1 with display shift

N = 1 2 rows

S/C =1 display shift

N = 0 1 row

S/C = 0 cursor movement

F = 1 5x10 dots

R/L = 1 shift to right

F = 0 5x7 dots

RS

R/W

DB7

DB6

DB5

DB4

DB3

DB2

DB1

DB0

Description

Clears all display, return cursor home

Returns cursor home

I/D

S/C

R/L

Move cursor and shifts display

DL

Sets interface data length, number of


display lines, and character font

R/L = 0 shift to left

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

WRITE DATA

Sets cursor move direction and/or


specifies not to shift display
ON/OFF of all display(D), cursor
ON/OFF (C), and blink position (B)

Writes Data

11

Keypad controller
N1
N2
N3
N4

k_pressed

M1
M2
M3
M4

4
key_code

key_code

keypad controller

N=4, M=4

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

Stepper motor controller


Stepper motor: rotates fixed number
of degrees when given a step signal
In contrast, DC motor just rotates when
power applied, coasts to stop otherwise
Specification: degrees/step or
#steps/revol. (e.g., 1.8 or 200 steps)

Rotation achieved by applying


specific voltage sequence to 4 coils
(1 or 2 coils driven during each step)

Controller greatly simplifies this

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Sequence
1
2
3
4
5

A
+
+
+

B
+
+
+

A
+
+
-

B
+
+
-

Vd

16

MC3479P 15

14

13

12

Bias/Set

11

Phase A

Clk

10

CW/CCW

O|C

Full/Half Step

GND

Red
White
Yellow
Black

Vm
B
B
GND

A
A
B
B

13

Stepper motor with controller (driver)


/* main.c */

MC3479P
Stepper Motor
Driver
10
7

void main(void){

sbit clk=P1^1;
sbit cw=P1^0;

8051
CW/CCW
CLK

P1.0
P1.1

2 A B 15
3 A B 14

*/turn the motor forward */


cw=0;
/* set direction */
clk=0;
/* pulse clock */
delay();
clk=1;

void delay(void){
int i, j;
for (i=0; i<1000; i++)
for ( j=0; j<50; j++)
i = i + 0;
}

/*turn the motor backwards */


cw=1;
/* set direction */
clk=0;
/* pulse clock */
delay();
clk=1;
}

The output pins on the stepper motor driver do not


provide enough current to drive the stepper motor.
To amplify the current, a buffer is needed. One
possible implementation of the buffers is pictured
to the right. Q1 is an MJE3055T NPN transistor
and Q2 is an MJE2955T PNP transistor.

Stepper
Motor

+V
1K
Q1
A

Q2
1K

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

14

Stepper motor without controller (driver)


8051
P2.4

/*main.c*/
sbit notA=P2^0;
sbit isA=P2^1;
sbit notB=P2^2;
sbit isB=P2^3;
sbit dir=P2^4;

GND/ +V

P2.3
P2.2
P2.1
P2.0

Stepper
Motor

A possible way to implement the buffers is located


below. The 8051 alone cannot drive the stepper motor, so
several transistors were added to increase the current going
to the stepper motor. Q1, Q3 are MJE3055T NPN
transistors and Q2 is an MJE2955T PNP transistor. A is
connected to the 8051 microcontroller and B is connected
to the stepper motor.
+V
1K

Q1
B

+V
1K
A

Q2
Q3

330

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

void delay(){
int a, b;
for(a=0; a<5000; a++)
for(b=0; b<10000; b++)
a=a+0;
}
void move(int dir, int steps) {
int y, z;
/* clockwise movement */
if(dir == 1){
for(y=0; y<=steps; y++){
for(z=0; z<=19; z+4){
isA=lookup[z];
isB=lookup[z+1];
notA=lookup[z+2];
notB=lookup[z+3];
delay();
}
}
}

/* counter clockwise movement */


if(dir==0){
for(y=0; y<=step; y++){
for(z=19; z>=0; z - 4){
isA=lookup[z];
isB=lookup[z-1];
notA=lookup[z -2];
notB=lookup[z-3];
delay( );
}
}
}
}
void main( ){
int z;
int lookup[20] = {
1, 1, 0, 0,
0, 1, 1, 0,
0, 0, 1, 1,
1, 0, 0, 1,
1, 1, 0, 0 };
while(1){
/*move forward, 15 degrees (2 steps) */
move(1, 2);
/* move backwards, 7.5 degrees (1step)*/
move(0, 1);
}
}

15

Analog-to-digital converters

3.0V
2.5V
2.0V
1.5V
1.0V
0.5V
0V

analog output (V)

5.0V
4.5V
4.0V
3.5V

1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000

analog input (V)

Vmax = 7.5V
7.0V
6.5V
6.0V
5.5V

2
1

t1

t2

0100

t3

2
1

time

t1

t4

1000 0110 0101


Digital output

0100

t3

time

t4

1000 0110
Digital input

0101

digital to analog

analog to digital

proportionality

t2

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

ADC using successive approximation


Given an analog input signal whose voltage should range from 0 to 15 volts, and an 8-bit digital encoding, calculate the correct encoding for
5 volts. Then trace the successive-approximation approach to find the correct encoding.
Va / Vmax = d /(2^n 1)
5/15 = d/(28-1)
d= 85
encoding: 01010101

Successive-approximation method
(Vmax Vmin) = 7.5 volts
Vmax = 7.5 volts.

(5.63 + 4.69) = 5.16 volts


Vmax = 5.16 volts.

(7.5 + 0) = 3.75 volts


Vmin = 3.75 volts.

(5.16 + 4.69) = 4.93 volts


Vmin = 4.93 volts.

(7.5 + 3.75) = 5.63 volts


Vmax = 5.63 volts

(5.16 + 4.93) = 5.05 volts


Vmax = 5.05 volts.

(5.63 + 3.75) = 4.69 volts


Vmin = 4.69 volts.

(5.05 + 4.93) = 4.99 volts

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 5 Memory

Outline

Memory Write Ability and Storage Permanence


Common Memory Types
Composing Memory
Memory Hierarchy and Cache
Advanced RAM

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introduction
Embedded systems functionality aspects
Processing
processors
transformation of data

Storage
memory
retention of data

Communication
buses
transfer of data

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Memory: basic concepts


Stores large number of bits

m x n: m words of n bits each


k = Log2(m) address input signals
or m = 2^k words
e.g., 4,096 x 8 memory:

m words

m n memory

n bits per word

32,768 bits
12 address input signals
8 input/output data signals

memory external view

r/w

Memory access
r/w: selects read or write
enable: read or write only when asserted
multiport: multiple accesses to different locations
simultaneously

2k n read and write


memory

enable

A0

Ak-1

Qn-1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Q0

Traditional ROM/RAM distinctions

ROM

RAM

EEPROM

FLASH
NVRAM

Nonvolatile

In-system
programmable

SRAM/DRAM

Near
zero

Write
ability

e.g., NVRAM

Write ability

EPROM

Tens of
years
Battery
life (10
years)

Ideal memory

OTP ROM

e.g., EEPROM

Advanced RAMs can hold bits without


power

read and write, lose stored bits without


power

Advanced ROMs can be written to

Mask-programmed ROM

Life of
product

Traditional distinctions blurred

read only, bits stored without power

Storage
permanence

Write ability/ storage permanence

Manner and speed a memory can be


written

During
External
External
External
External
In-system, fast
fabrication programmer, programmer, programmer programmer
writes,
1,000s
OR in-system, OR in-system,
only
one time only
unlimited
block-oriented
1,000s
of cycles
cycles
writes, 1,000s
of cycles
of cycles

Storage permanence

ability of memory to hold stored bits


after they are written

Write ability and storage permanence of memories,


showing relative degrees along each axis (not to scale).

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Write ability

Ranges of write ability


High end
processor writes to memory simply and quickly
e.g., RAM

Middle range
processor writes to memory, but slower
e.g., FLASH, EEPROM

Lower range
special equipment, programmer, must be used to write to memory
e.g., EPROM, OTP ROM

Low end
bits stored only during fabrication
e.g., Mask-programmed ROM

In-system programmable memory


Can be written to by a processor in the embedded system using the
memory
Memories in high end and middle range of write ability

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Storage permanence

Range of storage permanence


High end
essentially never loses bits
e.g., mask-programmed ROM

Middle range
holds bits days, months, or years after memorys power source turned off
e.g., NVRAM

Lower range
holds bits as long as power supplied to memory
e.g., SRAM

Low end
begins to lose bits almost immediately after written
e.g., DRAM

Nonvolatile memory
Holds bits after power is no longer supplied
High end and middle range of storage permanence

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

ROM: Read-Only Memory

Store software program for general-purpose


processor
program instructions can be one or more ROM
words
Store constant data needed by system
Implement combinational circuit
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

External view
2k n ROM

enable
A0

Nonvolatile memory
Can be read from but not written to, by a
processor in an embedded system
Traditionally written to, programmed,
before inserting to embedded system
Uses

Ak-1

Qn-1

Q0

Example: 8 x 4 ROM

Horizontal lines = words


Vertical lines = data
Lines connected only at circles
Decoder sets word 2s line to 1 if
address input is 010
Data lines Q3 and Q1 are set to 1
because there is a programmed
connection with word 2s line
Word 2 is not connected with data
lines Q2 and Q0
Output is 1010

Internal view
8 4 ROM
word 0

38
decoder

enable

word 1
word 2

A0
A1
A2

word line

data line
programmable
connection

wired-OR

Q3 Q2 Q1 Q0

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
H

Implementing combinational function


Any combinational circuit of n functions of same k variables
can be done with 2^k x n ROM

Truth table
Inputs (address)
a
b
c
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
1

Outputs
y
z
0
0
0
1
0
1
1
0
1
0
1
1
1
1
1
1

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

82 ROM
0
0
0
1
1
1
1
1

enable
c
b
a

0
1
1
0
0
1
1
1
z

word 0
word 1

word 7

10

Mask-programmed ROM
Connections programmed at fabrication
set of masks

Lowest write ability


only once

Highest storage permanence


bits never change unless damaged

Typically used for final design of high-volume systems


spread out NRE cost for a low unit cost

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

11

OTP ROM: One-time programmable ROM


Connections programmed after manufacture by user

user provides file of desired contents of ROM


file input to machine called ROM programmer
each programmable connection is a fuse
ROM programmer blows fuses where connections should not exist

Very low write ability


typically written only once and requires ROM programmer device

Very high storage permanence


bits dont change unless reconnected to programmer and more fuses
blown

Commonly used in final products


cheaper, harder to inadvertently modify
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

EPROM: Erasable programmable ROM

Programmable component is a MOS transistor

Transistor has floating gate surrounded by an insulator


(a) Negative charges form a channel between source and drain
storing a logic 1
(b) Large positive voltage at gate causes negative charges to
move out of channel and get trapped in floating gate storing a
logic 0
(c) (Erase) Shining UV rays on surface of floating-gate causes
negative charges to return to channel from floating gate restoring
the logic 1
(d) An EPROM package showing quartz window through which
UV light can pass

0V
floating gate
drain

source

(a)

+15V
(b)

source

drain

Better write ability

5-30 min

can be erased and reprogrammed thousands of times

Reduced storage permanence

source

drain

(c)

program lasts about 10 years but is susceptible to


radiation and electric noise

Typically used during design development


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

(d)

13

EEPROM: Electrically erasable


programmable ROM
Programmed and erased electronically
typically by using higher than normal voltage
can program and erase individual words

Better write ability


can be in-system programmable with built-in circuit to provide higher
than normal voltage
built-in memory controller commonly used to hide details from memory user

writes very slow due to erasing and programming


busy pin indicates to processor EEPROM still writing

can be erased and programmed tens of thousands of times

Similar storage permanence to EPROM (about 10 years)


Far more convenient than EPROMs, but more expensive
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

14

Flash Memory
Extension of EEPROM
Same floating gate principle
Same write ability and storage permanence

Fast erase
Large blocks of memory erased at once, rather than one word at a time
Blocks typically several thousand bytes large

Writes to single words may be slower


Entire block must be read, word updated, then entire block written back

Used with embedded systems storing large data items in


nonvolatile memory
e.g., digital cameras, TV set-top boxes, cell phones
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

RAM: Random-access memory


Typically volatile memory
bits are not held without power supply

Read and written to easily by embedded system

during execution
Internal structure more complex than ROM

external view

r/w

2k

enable

A0

Ak-1

Qn-1

a word consists of several memory cells, each


storing 1 bit
44 RAM
enable

rd/wr connected to every cell


when row is enabled by decoder, each cell has logic
that stores input data bit when rd/wr indicates write
or outputs stored bit when rd/wr indicates read

Q0

internal view
I3 I2 I1 I0

each input and output data line connects to each


cell in its column

n read and write


memory

24
decoder

A0
A1
Memory
cell

rd/wr

To every cell
Q3 Q 2 Q 1 Q 0

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

Basic types of RAM


SRAM: Static RAM
Memory cell uses flip-flop to store bit
Requires 6 transistors
Holds data as long as power supplied

memory cell internals

SRAM

Data'

Data

DRAM: Dynamic RAM


Memory cell uses MOS transistor and
capacitor to store bit
More compact than SRAM
Refresh required due to capacitor leak
words cells refreshed when read

DRAM
Data
W

Typical refresh rate 15.625 microsec.


Slower to access than SRAM
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Ram variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller
Popular low-cost high-density alternative to SRAM

NVRAM: Nonvolatile RAM


Holds data after external power removed
Battery-backed RAM
SRAM with own permanently connected battery
writes as fast as reads
no limit on number of writes unlike nonvolatile ROM-based memory

SRAM with EEPROM or flash


stores complete RAM contents on EEPROM or flash before power turned off

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

18

Example:
HM6264 & 27C256 RAM/ROM devices
Low-cost low-capacity memory
devices
Commonly used in 8-bit
microcontroller-based
embedded systems
First two numeric digits indicate
device type
RAM: 62
ROM: 27

11-13, 15-19

data<70>

2,23,21,24,
25, 3-10
22

addr<15...0>

11-13, 15-19

data<70>

27,26,2,23,21,

addr<15...0>

24,25, 3-10
22

/OE

27

/WE

20

/CS1

26

CS2 HM6264

20

/OE
/CS

27C256
block diagrams

Device
Access Time (ns)
HM6264
85-100
27C256
90

Standby Pwr. (mW)


.01
.5

Active Pwr. (mW)


15
100

Vcc Voltage (V)


5
5

device characteristics
Read operation

Subsequent digits indicate


capacity in kilobits

Write operation

data

data

addr

addr

OE

WE

/CS1

/CS1
CS2

CS2

timing diagrams

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

19

Example:
TC55V2325FF-100 memory device
2-megabit
synchronous pipelined
burst SRAM memory
device
Designed to be
interfaced with 32-bit
processors
Capable of fast
sequential reads and
writes as well as
single byte I/O

data<310>
addr<150>

Device
Access Time (ns)
TC55V23
10
25FF-100

addr<10...0>

Standby Pwr. (mW)


na

Active Pwr. (mW)


1200

Vcc Voltage (V)


3.3

device characteristics

/CS1

A single read operation

/CS2
CS3
CLK
/WE
/ADSP
/OE
/ADSC
MODE
/ADV
/ADSP
/ADSC
/ADV
CLK
TC55V2325F
F-100

addr <150>
/WE
/OE
/CS1 and /CS2
CS3
data<310>

block diagram
timing diagram

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

20

Composing memory

Memory size needed often differs from size of readily


available memories
When available memory is larger, simply ignore unneeded
high-order address bits and higher data lines
When available memory is smaller, compose several smaller
memories into one larger memory

Connect side-by-side to increase width of words


Connect top to bottom to increase number of words
added high-order address line selects smaller memory
containing desired word using a decoder
Combine techniques to increase number and width of words

Increase number of words


2m+1 n ROM
2m n ROM

A0
Am-1
Am

12
decoder

2m n ROM

enable

Qn-1

2m 3n ROM
2m n ROM

enable

Increase width
of words

A0
Am

2m n ROM

Increase number
and width of
words

Q3n-1

2m n ROM

Q2n-1

Q0

enable

Q0

outputs

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

Memory hierarchy
Want inexpensive, fast
memory
Main memory
Large, inexpensive, slow
memory stores entire
program and data

Cache
Small, expensive, fast
memory stores copy of likely
accessed parts of larger
memory
Can be multiple levels of
cache
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Processor

Registers

Cache

Main memory

Disk

Tape

22

Cache
Usually designed with SRAM
faster but more expensive than DRAM

Usually on same chip as processor


space limited, so much smaller than off-chip main memory
faster access ( 1 cycle vs. several cycles for main memory)

Cache operation:
Request for main memory access (read or write)
First, check cache for copy
cache hit
copy is in cache, quick access

cache miss
copy not in cache, read address and possibly its neighbors into cache

Several cache design choices


cache mapping, replacement policies, and write techniques
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

Cache mapping
Far fewer number of available cache addresses
Are address contents in cache?
Cache mapping used to assign main memory address to cache
address and determine hit or miss
Three basic techniques:
Direct mapping
Fully associative mapping
Set-associative mapping

Caches partitioned into indivisible blocks or lines of adjacent


memory addresses
usually 4 or 8 addresses per line
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

24

Direct mapping
Main memory address divided into 2 fields
Index
cache address
number of bits determined by cache size

Tag
compared with tag stored in cache at address
indicated by index
if tags match, check valid bit

Tag

Index

Offset

V T D

Valid bit

Data

indicates whether data in slot has been loaded


from memory

Valid
=

Offset
used to find particular word in cache line

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

25

Fully associative mapping


Complete main memory address stored in each cache address
All addresses stored in cache simultaneously compared with
desired address
Valid bit and offset same as direct mapping
Tag

Offset
Data
V T D

V T D

V T D

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Valid
=

26

Set-associative mapping
Compromise between direct mapping and
fully associative mapping
Index same as in direct mapping
But, each cache address contains content
and tags of 2 or more memory address
locations
Tags of that set simultaneously compared as
in fully associative mapping
Cache with set size N called N-way setassociative

Tag

Index
V T D

Offset
V T D
Data

Valid
=

2-way, 4-way, 8-way are common

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

Cache-replacement policy
Technique for choosing which block to replace
when fully associative cache is full
when set-associative caches line is full

Direct mapped cache has no choice


Random
replace block chosen at random

LRU: least-recently used


replace block not accessed for longest time

FIFO: first-in-first-out
push block onto queue when accessed
choose block to replace by popping queue
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

28

Cache write techniques


When written, data cache must update main memory
Write-through

write to main memory whenever cache is written to


easiest to implement
processor must wait for slower main memory write
potential for unnecessary writes

Write-back
main memory only written when dirty block replaced
extra dirty bit for each block set when cache block written to
reduces number of slow main memory writes

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

29

Cache impact on system performance


Most important parameters in terms of performance:
Total size of cache
total number of data bytes cache can hold
tag, valid and other house keeping bits not included in total

Degree of associativity
Data block size

Larger caches achieve lower miss rates but higher access cost
e.g.,
2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles

4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles

(improvement)

8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change
avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

(worse)
30

Cache performance trade-offs


Improving cache hit rate without increasing size
Increase line size
Change set-associativity
0.16
0.14
0.12
% cache miss

0.1

1 way
2 way

0.08

4 way

0.06

8 way

0.04
0.02
0
1 Kb

2 Kb

4 Kb

8 Kb

16 Kb 32 Kb

64 Kb 128 Kb

cache size

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

31

Advanced RAM
DRAMs commonly used as main memory in processor based
embedded systems
high capacity, low cost

Many variations of DRAMs proposed

need to keep pace with processor speeds


FPM DRAM: fast page mode DRAM
EDO DRAM: extended data out DRAM
SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM
RDRAM: rambus DRAM

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

32

Basic DRAM

address

cas

ras

Col Decoder
cas, ras, clock

Sense
Amplifiers

Row Decoder

Col Addr. Buffer

rd/wr

Row Addr. Buffer

Refresh
Circuit
Data In Buffer

strobes consecutive memory


address periodically causing
memory content to be refreshed
Refresh circuitry disabled
during read or write operation

data

Data Out Buffer

Address bus multiplexed


between row and column
components
Row and column addresses are
latched in, sequentially, by
strobing ras and cas signals,
respectively
Refresh circuitry can be external
or internal to DRAM device

Bit storage array

Embedded Systems Design: A Unified


Hardware/Software
Introduction, (c) 2000 Vahid/Givargis
ar

33

Fast Page Mode DRAM (FPM DRAM)

Each row of memory bit array is viewed as a page


Page contains multiple words
Individual words addressed by column address
Timing diagram:
row (page) address sent
3 words read consecutively by sending column address for each

Extra cycle eliminated on each read/write of words from same page


ras

cas

address

row

col

data

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

col

col

data

data

data

34

Extended data out DRAM (EDO DRAM)


Improvement of FPM DRAM
Extra latch before output buffer
allows strobing of cas before data read operation completed

Reduces read/write latency by additional cycle

ras
cas
address

row

col

data

col

col
data

data

data

Speedup through overlap

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

35

(S)ynchronous and
Enhanced Synchronous (ES) DRAM
SDRAM latches data on active edge of clock
Eliminates time to detect ras/cas and rd/wr signals
A counter is initialized to column address then incremented on
active edge of clock to access consecutive memory locations
ESDRAM improves SDRAM
added buffers enable overlapping of column addressing
faster clocking and lower read/write latency possible
clock
ras
cas
address

row

data

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

col
data

data

data

36

Rambus DRAM (RDRAM)


More of a bus interface architecture than DRAM
architecture
Data is latched on both rising and falling edge of
clock
Broken into 4 banks each with own row decoder
can have 4 pages open at a time

Capable of very high throughput

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

37

DRAM integration problem


SRAM easily integrated on same chip as processor
DRAM more difficult
Different chip making process between DRAM and
conventional logic
Goal of conventional logic (IC) designers:
minimize parasitic capacitance to reduce signal propagation delays
and power consumption

Goal of DRAM designers:


create capacitor cells to retain stored information

Integration processes beginning to appear


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

38

Memory Management Unit (MMU)


Duties of MMU
Handles DRAM refresh, bus interface and arbitration
Takes care of memory sharing among multiple
processors
Translates logic memory addresses from processor to
physical memory addresses of DRAM

Modern CPUs often come with MMU built-in


Single-purpose processors can be used

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

39

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 11: Design Technology

Outline

Automation: synthesis
Verification: hardware/software co-simulation
Reuse: intellectual property cores
Design process models

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introduction
Design task
Define system functionality
Convert functionality to physical implementation while
Satisfying constrained metrics
Optimizing other design metrics

Designing embedded systems is hard


Complex functionality
Millions of possible environment scenarios
Competing, tightly constrained metrics

Productivity gap
As low as 10 lines of code or 100 transistors produced per day

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Improving productivity
Design technologies developed to improve productivity
We focus on technologies advancing hardware/software unified
view
Automation

Specification
Automation

Program replaces manual design


Synthesis

Verification

Reuse

Implementation

Reuse

Predesigned components
Cores
General-purpose and single-purpose processors on single IC

Verification
Ensuring correctness/completeness of each design step
Hardware/software co-simulation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Automation: synthesis

Early design mostly hardware


Software complexity increased with advent
of general-purpose processor
Different techniques for software design
and hardware design

The codesign ladder


Sequential program code (e.g., C, VHDL)

Caused division of the two fields

Design tools evolve for higher levels of


abstraction
Different rate in each field

Behavioral synthesis
(1990s)

Compilers
(1960s,1970s)

Register transfers
RT synthesis
(1980s, 1990s)

Assembly instructions

Hardware/software design fields rejoining


Both can start from behavioral description in
sequential program model
30 years longer for hardware design to reach
this step in the ladder
Many more design dimensions
Optimization critical

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Logic equations / FSM's


Assemblers, linkers
(1950s, 1960s)

Logic synthesis
(1970s, 1980s)

Machine instructions
Microprocessor plus
program bits

Logic gates
Implementation

VLSI, ASIC, or PLD


implementation

Hardware/software parallel evolution

Software design evolution


Machine instructions
Assemblers

The codesign ladder

convert assembly programs into machine


instructions

Sequential program code (e.g., C, VHDL)

Compilers
translate sequential programs into assembly

Hardware design evolution

Behavioral synthesis
(1990s)

Compilers
(1960s,1970s)

Interconnected logic gates


Logic synthesis

Register transfers
RT synthesis
(1980s, 1990s)

Assembly instructions

converts logic equations or FSMs into gates


Logic equations / FSM's

Register-transfer (RT) synthesis


converts FSMDs into FSMs, logic equations,
predesigned RT components (registers,
adders, etc.)

Behavioral synthesis
converts sequential programs into FSMDs

Assemblers, linkers
(1950s, 1960s)

Logic synthesis
(1970s, 1980s)

Machine instructions
Microprocessor plus
program bits

Logic gates
Implementation

VLSI, ASIC, or PLD


implementation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Increasing abstraction level


Higher abstraction level focus of hardware/software design evolution
Description smaller/easier to capture
E.g., Line of sequential program code can translate to 1000 gates

Many more possible implementations available


(a) Like flashlight, the higher above the ground, the more ground illuminated
Sequential program designs may differ in performance/transistor count by orders of magnitude
Logic-level designs may differ by only power of 2

modeling cost increases


opportunities decrease

(b) Design process proceeds to lower abstraction level, narrowing in on single


implementation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

idea

idea
back-of-the-envelope
sequential program
register-transfers
logic

implementation
(a)

implementation
(b)

Synthesis
Automatically converting systems behavioral description to a structural
implementation
Complex whole formed by parts
Structural implementation must optimize design metrics

More expensive, complex than compilers


Cost = $100s to $10,000s
User controls 100s of synthesis options
Optimization critical
Otherwise could use software

Optimizations different for each user


Run time = hours, days

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Gajskis Y-chart

Each axis represents type of description


Behavioral
Defines outputs as function of inputs
Algorithms but no implementation

Structural
Implements behavior by connecting
components with known behavior

Processors, memories

Gives size/locations of components and


wires on chip/board

Synthesis converts behavior at given level


to structure at same level or lower

Register transfers

Gates, flip-flops

Logic equations/FSM

Transistors

Transfer functions
Cell Layout
Modules

E.g.,

Sequential programs

Registers, FUs, MUXs

Physical

Behavior

Structural

FSM gates, flip-flops (same level)


FSM transistors (lower level)
FSM X registers, FUs (higher level)
FSM X processors, memories (higher level)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Chips
Boards
Physical

Logic synthesis

Logic-level behavior to structural implementation


Logic equations and/or FSM to connected gates

Combinational logic synthesis


Two-level minimization (Sum of products/product of sums)
Best possible performance
Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate)

Minimize size
Minimum cover
Minimum cover that is prime
Heuristics

Multilevel minimization
Trade performance for size
Pareto-optimal solution
Heuristics

FSM synthesis
State minimization
State encoding

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Two-level minimization
Represent logic function as sum of
products (or product of sums)
AND gate for each product
OR gate for each sum

Gives best possible performance


At most 2 gate delay

Goal: minimize size


Minimum cover

Sum of products
F = abc'd' + a'b'cd + a'bcd + ab'cd

Direct implementation
a
b
c

Minimum # of AND gates (sum of products)

Minimum cover that is prime


Minimum # of inputs to each AND gate (sum
of products)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

4 4-input AND gates and


1 4-input OR gate
40 transistors

11

Minimum cover
Minimum # of AND gates (sum of products)
Literal: variable or its complement
a or a, b or b, etc.

Minterm: product of literals


Each literal appears exactly once
abcd, abcd, abcd, etc.

Implicant: product of literals


Each literal appears no more than once
abcd, acd, etc.

Covers 1 or more minterms


acd covers abcd and abcd

Cover: set of implicants that covers all minterms of function


Minimum cover: cover with minimum # of implicants
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

Minimum cover: K-map approach


Karnaugh map (K-map)
1 represents minterm
Circle represents implicant

K-map: sum of products


cd
ab 00 01 11 10

Minimum cover
Covering all 1s with min # of
circles
Example: direct vs. min cover

K-map: minimum cover


cd
ab 00 01 11 10

00

00

01

01

11

11

10

10

Minimum cover
F=abc'd' + a'cd + ab'cd

Less gates
Minimum cover implementation

4 vs. 5

Less transistors
28 vs. 40

a
b
c

2 4-input AND gate


1 3-input AND gates
1 4 input OR gate
28 transistors

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

13

Minimum cover that is prime


Minimum # of inputs to AND gates
Prime implicant

K-map: minimum cover that is prime


cd

ab

Implicant not covered by any other


implicant
Max-sized circle in K-map

00

01

11

10

00

01

11

10

Minimum cover that is prime


Minimum cover that is prime

Covering with min # of prime implicants


Min # of max-sized circles
Example: prime cover vs. min cover
Same # of gates
4 vs. 4

Less transistors
26 vs. 28

F=abc'd' + a'cd + b'cd

Implementation
a
b
c
d

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

1 4-input AND gate


2 3-input AND gates
F 1 4 input OR gate
26 transistors

14

Minimum cover: heuristics


K-maps give optimal solution every time
Functions with > 6 inputs too complicated
Use computer-based tabular method

Finds all prime implicants


Finds min cover that is prime
Also optimal solution every time
Problem: 2n minterms for n inputs
32 inputs = 4 billion minterms
Exponential complexity

Heuristic
Solution technique where optimal solution not guaranteed
Hopefully comes close
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

Heuristics: iterative improvement


Start with initial solution
i.e., original logic equation

Repeatedly make modifications toward better solution


Common modifications
Expand
Replace each nonprime implicant with a prime implicant covering it
Delete all implicants covered by new prime implicant

Reduce
Opposite of expand

Reshape
Expands one implicant while reducing another
Maintains total # of implicants

Irredundant
Selects min # of implicants that cover from existing implicants

Synthesis tools differ in modifications used and the order they are used
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

Multilevel logic minimization


Trade performance for size
Increase delay for lower # of gates
Gray area represents all possible
solutions
Circle with X represents ideal solution
2-level gives best performance
max delay = 2 gates
Solve for smallest size

Multilevel gives pareto-optimal solution


Minimum delay for a given size
Minimum size for a given delay

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

delay

Generally not possible

2-level minim.

size

17

Example
Minimized 2-level logic function:
F = adef + bdef + cdef + gh
Requires 5 gates with 18 total gate inputs
4 ANDS and 1 OR

After algebraic manipulation:


F = (a + b + c)def + gh
Requires only 4 gates with 11 total gate inputs
2 ANDS and 2 ORs

Less inputs per gate


Assume gate inputs = 2 transistors
Reduced by 14 transistors
36 (18 * 2) down to 22 (11 * 2)

Sacrifices performance for size


Inputs a, b, and c now have 3-gate delay

Iterative improvement heuristic commonly used


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

2-level minimized
a
d
b
e
c
f
g
h

multilevel minimized
a
b
c
d
e
f
g
h

18

FSM synthesis
FSM to gates
State minimization
Reduce # of states
Identify and merge equivalent states
Outputs, next states same for all possible inputs
Tabular method gives exact solution
Table of all possible state pairs
If n states, n2 table entries
Thus, heuristics used with large # of states

State encoding

Unique bit sequence for each state


If n states, log2(n) bits
n! possible encodings
Thus, heuristics common

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

19

Technology mapping
Library of gates available for implementation
Simple
only 2-input AND,OR gates

Complex
various-input AND,OR,NAND,NOR,etc. gates
Efficiently implemented meta-gates (i.e., AND-OR-INVERT,MUX)

Final structure consists of specified librarys components only


If technology mapping integrated with logic synthesis
More efficient circuit
More complex problem
Heuristics required

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

20

Complexity impact on user

As complexity grows, heuristics used


Heuristics differ tremendously among synthesis tools
Computationally expensive

Higher quality results


Variable optimization effort settings
Long run times (hours, days)
Requires huge amounts of memory
Typically needs to run on servers, workstations

Fast heuristics

Lower quality results


Shorter run times (minutes, hours)
Smaller amount of memory required
Could run on PC

Super-linear-time (i.e. n3) heuristics usually used


User can partition large systems to reduce run times/size
1003 > 503 + 503 (1,000,000 > 250,000)

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

Integrating logic design and physical design


Past
Gate delay much greater than wire delay
Thus, performance evaluated as # of levels
of gates only

Today
Wire
Delay

Gate delay shrinking as feature size


shrinking
Wire delay increasing

Transistor

Performance evaluation needs wire length

Transistor placement (needed for wire


length) domain of physical design
Thus, simultaneous logic synthesis and
physical design required for efficient
circuits

Reduced feature size

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

22

Register-transfer synthesis
Converts FSMD to custom single-purpose processor
Datapath
Register units to store variables
Complex data types

Functional units
Arithmetic operations

Connection units
Buses, MUXs

FSM controller
Controls datapath

Key sub problems:


Allocation
Instantiate storage, functional, connection units

Binding
Mapping FSMD operations to specific units
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

Behavioral synthesis
High-level synthesis
Converts single sequential program to single-purpose processor
Does not require the program to schedule states

Key sub problems


Allocation
Binding
Scheduling
Assign sequential programs operations to states
Conversion template given in Ch. 2

Optimizations important
Compiler
Constant propagation, dead-code elimination, loop unrolling

Advanced techniques for allocation, binding, scheduling


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

24

System synthesis
Convert 1 or more processes into 1 or more processors (system)
For complex embedded systems
Multiple processes may provide better performance/power
May be better described using concurrent sequential programs

Tasks
Transformation

Can merge 2 exclusive processes into 1 process


Can break 1 large process into separate processes
Procedure inlining
Loop unrolling

Allocation
Essentially design of system architecture
Select processors to implement processes
Also select memories and busses
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

25

System synthesis
Tasks (cont.)
Partitioning
Mapping 1 or more processes to 1 or more processors
Variables among memories
Communications among buses

Scheduling
Multiple processes on a single processor
Memory accesses
Bus communications

Tasks performed in variety of orders


Iteration among tasks common

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

26

System synthesis
Synthesis driven by constraints
E.g.,
Meet performance requirements at minimum cost
Allocate as much behavior as possible to general-purpose processor
Low-cost/flexible implementation
Minimum # of SPPs used to meet performance

System synthesis for GPP only (software)


Common for decades
Multiprocessing
Parallel processing
Real-time scheduling

Hardware/software codesign
Simultaneous consideration of GPPs/SPPs during synthesis
Made possible by maturation of behavioral synthesis in 1990s
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

Temporal vs. spatial thinking


Design thought process changed by evolution of synthesis
Before synthesis
Designers worked primarily in structural domain
Connecting simpler components to build more complex systems
Connecting logic gates to build controller
Connecting registers, MUXs, ALUs to build datapath

capture and simulate era


Capture using CAD tools
Simulate to verify correctness before fabricating

Spatial thinking
Structural diagrams
Data sheets

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

28

Temporal vs. spatial thinking


After synthesis
describe-and-synthesize era
Designers work primarily in behavioral domain
describe and synthesize era
Describe FSMDs or sequential programs
Synthesize into structure

Temporal thinking
States or sequential statements have relationship over time

Strong understanding of hardware structure still important


Behavioral description must synthesize to efficient structural
implementation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

29

Verification
Ensuring design is correct and complete
Correct
Implements specification accurately

Complete
Describes appropriate output to all relevant input

Formal verification
Hard
For small designs or verifying certain key properties only

Simulation
Most common verification method

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

30

Formal verification
Analyze design to prove or disprove certain properties
Correctness example
Prove ALU structural implementation equivalent to behavioral
description
Derive Boolean equations for outputs
Create truth table for equations
Compare to truth table from original behavior

Completeness example
Formally prove elevator door can never open while elevator is moving
Derive conditions for door being open
Show conditions conflict with conditions for elevator moving

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

31

Simulation
Create computer model of design
Provide sample input
Check for acceptable output

Correctness example
ALU
Provide all possible input combinations
Check outputs for correct results

Completeness example
Elevator door closed when moving
Provide all possible input sequences
Check door always closed when elevator moving

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

32

Increases confidence
Simulating all possible input sequences impossible for most
systems
E.g., 32-bit ALU

232 * 232 = 264 possible input combinations


At 1 million combinations/sec
million years to simulate
Sequential circuits even worse

Can only simulate tiny subset of possible inputs


Typical values
Known boundary conditions
E.g., 32-bit ALU
Both operands all 0s
Both operands all 1s

Increases confidence of correctness/completeness


Does not prove
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

33

Advantages over physical implementation


Controllability
Control time
Stop/start simulation at any time

Control data values


Inputs or internal values

Observability
Examine system/environment values at any time

Debugging
Can stop simulation at any point and:
Observe internal values
Modify system/environment values before restarting

Can step through small intervals (i.e., 500 nanoseconds)


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

34

Disadvantages
Simulation setup time
Often has complex external environments
Could spend more time modeling environment than system

Models likely incomplete


Some environment behavior undocumented if complex environment
May not model behavior correctly

Simulation speed much slower than actual execution


Sequentializing parallel design
IC: gates operate in parallel
Simulation: analyze inputs, generate outputs for each gate 1 at time

Several programs added between simulated system and real hardware


1 simulated operation:
= 10 to 100 simulator operations
= 100 to 10,000 operating system operations
= 1,000 to 100,000 hardware operations

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

35

Simulation speed
Relative speeds of different types of
simulation/emulation
1 hour actual execution of SOC
= 1.2 years instruction-set simulation
= 10,000,000 hours gate-level simulation
1
u10
u100
u10000
u1,000,000
u10,000,000

1 hour
1 day

hardware emulation
throughput model

u1000
u100,000

IC
FPGA

4 days
1.4 months

instruction-set simulation
cycle-accurate simulation

register-transfer-level HDL simulation


gate-level HDL simulation

1.2 years
12 years
>1 lifetime
1
millennium

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

36

Overcoming long simulation time


Reduce amount of real time simulated
1 msec execution instead of 1 hour
0.001sec * 10,000,000 = 10,000 sec = 3 hours

Reduced confidence
1 msec of cruise controller operation tells us little

Faster simulator
Emulators
Special hardware for simulations

Less precise/accurate simulators


Exchange speed for observability/controllability
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

37

Reducing precision/accuracy
Dont need gate-level analysis for all simulations
E.g., cruise control
Dont care what happens at every input/output of each logic gate

Simulating RT components ~10x faster


Cycle-based simulation ~100x faster
Accurate at clock boundaries only
No information on signal changes between boundaries

Faster simulator often combined with reduction in real time


If willing to simulate for 10 hours
Use instruction-set simulator
Real execution time simulated
10 hours * 1 / 10,000
= 0.001 hour
= 3.6 seconds
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

38

Hardware/software co-simulation
Variety of simulation approaches exist
From very detailed
E.g., gate-level model

To very abstract
E.g., instruction-level model

Simulation tools evolved separately for hardware/software


Recall separate design evolution
Software (GPP)
Typically with instruction-set simulator (ISS)

Hardware (SPP)
Typically with models in HDL environment

Integration of GPP/SPP on single IC creating need for merging


simulation tools
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

39

Integrating GPP/SPP simulations


Simple/nave way
HDL model of microprocessor
Runs system software
Much slower than ISS
Less observable/controllable than ISS

HDL models of SPPs


Integrate all models

Hardware-software co-simulator

ISS for microprocessor


HDL model for SPPs
Create communication between simulators
Simulators run separately except when transferring data
Faster
Though, frequent communication between ISS and HDL model slows it down

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

40

Minimizing communication
Memory shared between GPP and SPPs
Where should memory go?
In ISS
HDL simulator must stall for memory access

In HDL?
ISS must stall when fetching each instruction

Model memory in both ISS and HDL


Most accesses by each model unrelated to others accesses
No need to communicate these between models

Co-simulator ensures consistency of shared data


Huge speedups (100x or more) reported with this technique

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

41

Emulators
General physical device system mapped to
Microprocessor emulator
Microprocessor IC with some monitoring, control circuitry

SPP emulator
FPGAs (10s to 100s)

Usually supports debugging tasks

Created to help solve simulation disadvantages


Mapped relatively quickly
Hours, days

Can be placed in real environment


No environment setup time
No incomplete environment

Typically faster than simulation


Hardware implementation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

42

Disadvantages
Still not as fast as real implementations
E.g., emulated cruise-control may not respond fast enough to
keep control of car

Mapping still time consuming


E.g., mapping complex SOC to 10 FPGAs
Just partitioning into 10 parts could take weeks

Can be very expensive


Top-of-the-line FPGA-based emulator: $100,000 to $1mill
Leads to resource bottleneck
Can maybe only afford 1 emulator
Groups wait days, weeks for other group to finish using
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

43

Reuse: intellectual property cores


Commercial off-the-shelf (COTS) components

Predesigned, prepackaged ICs


Implements GPP or SPP
Reduces design/debug time
Have always been available

System-on-a-chip (SOC)
All components of system implemented on single chip
Made possible by increasing IC capacities
Changing the way COTS components sold
As intellectual property (IP) rather than actual IC
Behavioral, structural, or physical descriptions
Processor-level components known as cores

SOC built by integrating multiple descriptions


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

44

Cores
Soft core
Synthesizable behavioral
description
Typically written in HDL
(VHDL/Verilog)

Gajskis Y-chart

Processors, memories

Firm core
Structural description
Typically provided in HDL

Hard core
Physical description
Provided in variety of physical
layout file formats
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Behavior

Structural

Sequential programs

Registers, FUs, MUXs

Register transfers

Gates, flip-flops

Logic equations/FSM

Transistors

Transfer functions
Cell Layout
Modules
Chips
Boards
Physical

45

Advantages/disadvantages of hard core


Ease of use
Developer already designed and tested core
Can use right away
Can expect to work correctly

Predictability
Size, power, performance predicted accurately

Not easily mapped (retargeted) to different process


E.g., core available for vendor Xs 0.25 micrometer CMOS
process
Cant use with vendor Xs 0.18 micrometer process
Cant use with vendor Y
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

46

Advantages/disadvantages of soft/firm cores


Soft cores
Can be synthesized to nearly any technology
Can optimize for particular use
E.g., delete unused portion of core
Lower power, smaller designs

Requires more design effort


May not work in technology not tested for
Not as optimized as hard core for same processor

Firm cores
Compromise between hard and soft cores
Some retargetability
Limited optimization
Better predictability/ease of use
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

47

New challenges to processor providers


Cores have dramatically changed business model
Pricing models
Past
Vendors sold product as IC to designers
Designers must buy any additional copies
Could not (economically) copy from original

Today
Vendors can sell as IP
Designers can make as many copies as needed

Vendor can use different pricing models


Royalty-based model
Similar to old IC model
Designer pays for each additional model
Fixed price model
One price for IP and as many copies as needed
Many other models used
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

48

IP protection
Past
Illegally copying IC very difficult
Reverse engineering required tremendous, deliberate effort
Accidental copying not possible

Today
Cores sold in electronic format

Deliberate/accidental unauthorized copying easier


Safeguards greatly increased
Contracts to ensure no copying/distributing
Encryption techniques
limit actual exposure to IP

Watermarking
determines if particular instance of processor was copied
whether copy authorized
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

49

New challenges to processor users

Licensing arrangements
Not as easy as purchasing IC
More contracts enforcing pricing model and IP protection
Possibly requiring legal assistance

Extra design effort


Especially for soft cores
Must still be synthesized and tested
Minor differences in synthesis tools can cause problems

Verification requirements more difficult


Extensive testing for synthesized soft cores and soft/firm cores mapped to particular
technology
Ensure correct synthesis
Timing and power vary between implementations

Early verification critical


Cores buried within IC
Cannot simply replace bad core
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

50

Design process model


Describes order that design steps are processed
Behavior description step
Behavior to structure conversion step
Mapping structure to physical implementation
step

Waterfall design model


Behavioral
Structural

Waterfall model
Physical

Proceed to next step only after current step


completed

Spiral model
Proceed through 3 steps in order but with less
detail
Repeat 3 steps gradually increasing detail
Keep repeating until desired system obtained
Becoming extremely popular (hardware &
software development)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Spiral design model


Structural

Behavioral

Physical

51

Waterfall method
Not very realistic
Bugs often found in later steps that must be fixed in
earlier step
E.g., forgot to handle certain input condition

Prototype often needed to know complete desired


behavior

Waterfall design model

E.g, customer adds features after product demo

Behavioral

System specifications commonly change


E.g., to remain competitive by reducing power, size

Structural

Certain features dropped

Unexpected iterations back through 3 steps


cause missed deadlines

Physical

Lost revenues
May never make it to market
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

52

Spiral method
First iteration of 3 steps incomplete
Much faster, though
End up with prototype
Use to test basic functions
Get idea of functions to add/remove

Original iteration experience helps in following


iterations of 3 steps

Spiral design model


Structural

Behavioral

Must come up with ways to obtain structure and


physical implementations quickly
E.g., FPGAs for prototype
silicon for final product

May have to use more tools

Physical

Extra effort/cost

Could require more time than waterfall method


If correct implementation first time with waterfall
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

53

General-purpose processor design models


Previous slides focused on SPPs
Can apply equally to GPPs
Waterfall model

Structure developed by particular company


Acquired by embedded system designer
Designer develops software (behavior)
Designer maps application to architecture
Compilation
Manual design

Spiral-like model
Beginning to be applied by embedded system designers
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

54

Spiral-like model

Designer develops or acquires architecture


Develops application(s)
Maps application to architecture
Analyzes design metrics
Now makes choice
Modify mapping
Modify application(s) to better suit architecture
Modify architecture to better suit application(s)

Y-chart
Architecture

Application(s)

Mapping

Not as difficult now


Maturation of synthesis/compilers
IPs can be tuned

Analysis

Continue refining to lower abstraction level until


particular implementation chosen
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

55

Summary
Design technology seeks to reduce gap between IC
capacity growth and designer productivity growth
Synthesis has changed digital design
Increased IC capacity means sw/hw components
coexist on one chip
Design paradigm shift to core-based design
Simulation essential but hard
Spiral design process is popular
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

56

Embedded Systems Design: A Unified


Hardware/Software Introduction

Chapter 7 Digital Camera Example

Outline

Introduction to a simple digital camera


Designers perspective
Requirements specification
Design
Four implementations

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introduction
Putting it all together
Instruction-set processor (GPP, ASIP)
Single-purpose processor
Custom
Standard

Memory
Interfacing

Knowledge applied to designing a simple digital


camera
GPP/ASIP vs. single-purpose processors
Partitioning of functionality among different processor types
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Introduction to a simple digital camera


Captures images
Stores images in digital format
No film
Multiple images stored in camera
Number depends on amount of memory and bits used per image

Downloads images to PC
Only recently possible
Systems-on-a-chip
Multiple processors and memories on one IC

High-capacity flash memory

Very simple description used for example


Many more features with real digital camera
Variable size images, image deletion, digital stretching, zooming in and out, etc.
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Designers perspective
Two key tasks
Processing images and storing in memory
When shutter pressed:
Image captured
Converted to digital form by charge-coupled device (CCD)
Compressed and archived in internal memory

Uploading images to PC
Digital camera attached to PC
Special software commands camera to transmit archived
images serially
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Charge-coupled device (CCD)


Special sensor that captures a B/W image (8 bits/pixel, 16 bits/pixel, )
Light-sensitive silicon solid-state device composed of many cells
When exposed to light, each
cell becomes electrically
charged. This charge can
then be converted to a 8-bit
value where 0 represents no
exposure while 255
represents very intense
exposure of that cell to light.

The electromechanical shutter


is activated to expose the
cells to light for a brief
moment.

Lens area
Covered columns Electro-

Pixel rows

mechanical
shutter

Some of the columns are


covered with a black strip of
paint. The light-intensity of
these pixels is used for zerobias adjustments of all the
cells.

The electronic circuitry, when


commanded, discharges the
cells, activates the
electromechanical shutter,
and then reads the 8-bit
charge value of each cell.
These values can be clocked
out of the CCD by external
logic through a standard
parallel bus interface.

Electronic
circuitry

Pixel columns

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Zero-bias error
Manufacturing errors cause cells to measure slightly above or below actual
light intensity
Error is typically the same across columns, but is different across rows
Some of left most columns blocked by black paint to detect zero-bias error
Reading of other than 0 in blocked cells is zero-bias error
Each row is corrected by subtracting the average error found in blocked cells for
that row
Covered
cells
136
145
144
176
144
122
121
173

170
146
153
183
156
131
155
175

155
168
168
161
161
128
164
176

140
123
117
111
133
147
185
183

144
120
121
186
192
206
254
188

115
117
127
130
153
151
165
184

112
119
118
132
138
131
138
117

248 12
147 12
135 9
133 0
139 7
127 2
129 4
129 5

Before zero-bias adjustment


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

14
10
9
0
7
0
4
5

Zero-bias
adjustment
-13
-11
-9
0
-7
-1
-4
-5

123
134
135
176
137
121
117
168

157
135
144
183
149
130
151
170

142
157
159
161
154
127
160
171

127
112
108
111
126
146
181
178

131
109
112
186
185
205
250
183

102
106
118
130
146
150
161
179

99
108
109
132
131
130
134
112

235
136
126
133
132
126
125
124

After zero-bias adjustment


7

Compression
Store more images
Transmit image to PC in less time
JPEG (Joint Photographic Experts Group)
Popular standard format for representing digital images in a compressed
form
Provides for a number of different modes of operation
Mode used in this chapter provides high compression ratios using DCT
(discrete cosine transform)
Image data divided into blocks of 8 x 8 pixels
3 steps performed on each block
DCT
Quantization
Huffman encoding
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

DCT step
Transforms original 8 x 8 block into a cosine-frequency
domain
Upper-left corner values represent more of the essence of the image
Lower-right corner values represent finer details
Can reduce precision of these values and retain reasonable image quality

FDCT (Forward DCT) formula


C(h) = [ if (h == 0) then 1/sqrt(2) else 1.0 ]
Auxiliary function used in main function F(u,v)

F(u,v) = C(u) C(v) x=0..7 y=0..7 Dxy FRV> [ X@FRV> \ Y@
Gives encoded pixel at row u, column v
Dxy is original pixel value at row x, column y

IDCT (Inverse DCT)


Reverses process to obtain original block (not needed for this design)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Quantization step
Achieve high compression ratio by reducing image
quality (loss compression)
Reduce bit precision of encoded data
Fewer bits needed for encoding
One way is to divide all values by a factor of 2
Simple right shifts can do this

Dequantization would reverse process for decompression


1150
-81
14
2
44
36
-19
-5

39 -43
-3 115
-11
1
-61 -13
13 37
-11
-9
-7 21
-13 -11

-10
-73
-42
-12
-4
-4
-6
-17

26
-6
26
36
10
20
3
-4

-83
-2
-3
-23
-21
-28
3
-1

11
22
17
-18
7
-21
12
7

41
-5
-38
5
-8
14
-21
-4

Divide each cells


value by 8

After being decoded using DCT

144
-10
2
0
6
5
-2
-1

5
0
-1
-8
2
-1
-1
-2

-5
14
0
-2
5
-1
3
-1

-1
-9
-5
-2
-1
-1
-1
-2

3
-1
3
5
1
3
0
-1

-10
0
0
-3
-3
-4
0
0

1
3
2
-2
1
-3
2
1

5
-1
-5
1
-1
2
-3
-1

After quantization

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

10

Huffman encoding step


Serialize 8 x 8 block of pixels
Values are converted into single list using zigzag pattern

Perform Huffman encoding


More frequently occurring pixels assigned short binary code
Longer binary codes left for less frequently occurring pixels

Each pixel in serial list converted to Huffman encoded values


Much shorter list, thus compression

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

11

Huffman encoding example

Pixel frequencies on left

Pixel value 1 occurs 15 times


Pixel value 14 occurs 1 time

Build Huffman tree from bottom up

Create one leaf node for each pixel


value and assign frequency as nodes
value
Create an internal node by joining any
two nodes whose sum is a minimal
value

Repeat until complete binary tree

Traverse tree from root to leaf to


obtain binary code for leafs pixel
value

This sum is internal nodes value

Append 0 for left traversal, 1 for right


traversal

Pixel
frequencies
-1 15x
0
8x
-2
6x
1
5x
2
5x
3
5x
5
5x
-3
4x
-5
3x
-10 2x
144 1x
-9
1x
-8
1x
-4
1x
6
1x
14 1x

6
4
3
5

29

-1

1
5

1
7

1
8

1
4

1
0

-2

-10

5
2

3
1
6

-5
1
14

1
1

Huffman encoding is reversible

Huffman
codes

Huffman tree

-3
1
-4

1
-8

1
-9

1
144

-1
0
-2
1
2
3
5
-3
-5
-10
144
-9
-8
-4
6
14

00
100
110
010
1110
1010
0110
11110
10110
01110
111111
111110
101111
101110
011111
011110

No code is a prefix of another code

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

12

Archive step
Record starting address and image size
Can use linked list

One possible way to archive images


If max number of images archived is N:

Set aside memory for N addresses and N image-size variables


Keep a counter for location of next available address
Initialize addresses and image-size variables to 0
Set global memory address to N x 4
Assuming addresses, image-size variables occupy N x 4 bytes

First image archived starting at address N x 4


Global memory address updated to N x 4 + (compressed image size)

Memory requirement based on N, image size, and average


compression ratio
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

13

Uploading to PC
When connected to PC and upload command received
Read images from memory
Transmit serially using UART
While transmitting
Reset pointers, image-size variables and global memory pointer
accordingly

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

14

Requirements Specification
Systems requirements what system should do
Nonfunctional requirements
Constraints on design metrics (e.g., should use 0.001 watt or less)

Functional requirements
Systems behavior (e.g., output X should be input Y times 2)

Initial specification may be very general and come from marketing dept.
E.g., short document detailing market need for a low-end digital camera that:

captures and stores at least 50 low-res images and uploads to PC,


costs around $100 with single medium-size IC costing less that $25,
has long as possible battery life,
has expected sales volume of 200,000 if market entry < 6 months,
100,000 if between 6 and 12 months,
insignificant sales beyond 12 months

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

15

Nonfunctional requirements
Design metrics of importance based on initial specification

Performance: time required to process image


Size: number of elementary logic gates (2-input NAND gate) in IC
Power: measure of avg. electrical energy consumed while processing
Energy: battery lifetime (power x time)

Constrained metrics
Values must be below (sometimes above) certain threshold

Optimization metrics
Improved as much as possible to improve product

A metric can be both constrained and optimization

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

16

Nonfunctional requirements (cont.)

Performance
Must process image fast enough to be useful
1 sec reasonable constraint
Slower would be annoying
Faster not necessary for low-end of market

Therefore, constrained metric

Size
Must use IC that fits in reasonably sized camera
Constrained and optimization metric
Constraint could be 200,000 gates, but smaller would be cheaper

Power
Must operate below certain temperature (cooling fan not possible)
Therefore, constrained metric

Energy
Reducing power or time reduces energy
Optimized metric: want battery to last as long as possible

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

17

Informal functional specification


Flowchart breaks functionality
down into simpler functions
Each functions details could then
be described in English

Zero-bias adjust

CCD
input

DCT

Done earlier in chapter


Quantize

Low quality image has resolution


of 64 x 64 (only for example;
typically 640x480 or more)

yes
no

Archive in
memory

yes

More
88
blocks?

no

Done?

Transmit serially

serial output
e.g., 011010...

Mapping functions to a particular


processor type not done at this
stage
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

18

Refined functional specification


Refine informal specification into
one that can actually be executed
Can use C/C++ code to describe
each function
Called system-level model,
prototype, or simply model
Also is first implementation

Can provide insight into operations


of system

Executable model of digital camera

101011010
110101010
010101101.
..

CCD.C

CCDPP.C
image file
CNTRL.C

101010101
010101010
101010101
0...

Profiling can find computationally


intensive functions

Can obtain sample output used to


verify correctness of final
implementation

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

CODEC.C

UART.C
output file

19

CCD module

Simulates real CCD


CcdInitialize is passed name of image file
CcdCapture reads image from file
CcdPopPixel outputs pixels one at a time

void CcdInitialize(const char *imageFileName) {


imageFileHandle = fopen(imageFileName, "r");
rowIndex = -1;
colIndex = -1;
}

#include <stdio.h>
#define SZ_ROW

64
void CcdCapture(void) {

#define SZ_COL

(64 + 2)
int pixel;

static FILE *imageFileHandle;


rewind(imageFileHandle);
static char buffer[SZ_ROW][SZ_COL];
for(rowIndex=0; rowIndenx<SZ_ROW; rowIndex++) {
static unsigned rowIndex, colIndex;
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
char CcdPopPixel(void) {
char pixel;
pixel = buffer[rowIndex][colIndex];
if( ++colIndex == SZ_COL ) {
colIndex = 0;
if( ++rowIndex == SZ_ROW ) {
colIndex = -1;
rowIndex = -1;
}
}
return pixel;
}

if( fscanf(imageFileHandle, "%i", &pixel) == 1 ) {


buffer[rowIndex][colIndex] = (char)pixel;
}
}
}
rowIndex = 0;
colIndex = 0;
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

20

CCDPP (CCD PreProcessing) module

Performs zero-bias adjustment


CcdppCapture uses CcdCapture and CcdPopPixel to obtain
image
Performs zero-bias adjustment after each row read in

#define SZ_ROW

64

#define SZ_COL

64

static char buffer[SZ_ROW][SZ_COL];


static unsigned rowIndex, colIndex;
void CcdppInitialize() {
rowIndex = -1;

void CcdppCapture(void) {

colIndex = -1;

char bias;
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {

}
char CcdppPopPixel(void) {
char pixel;

for(colIndex=0; colIndex<SZ_COL; colIndex++) {

pixel = buffer[rowIndex][colIndex];

buffer[rowIndex][colIndex] = CcdPopPixel();

if( ++colIndex == SZ_COL ) {

}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;

colIndex = 0;

for(colIndex=0; colIndex<SZ_COL; colIndex++) {

if( ++rowIndex == SZ_ROW ) {


colIndex = -1;

buffer[rowIndex][colIndex] -= bias;

rowIndex = -1;

}
}

}
}

rowIndex = 0;

return pixel;

colIndex = 0;
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

21

UART module

Actually a half UART


Only transmits, does not receive

UartInitialize is passed name of file to output to

UartSend transmits (writes to output file) bytes at a time

#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, "w");
}
void UartSend(char d) {
fprintf(outputFileHandle, "%i\n", (int)d);
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

22

CODEC module
static short ibuffer[8][8], obuffer[8][8], idx;

Models FDCT encoding


ibuffer holds original 8 x 8 block
obuffer holds encoded 8 x 8 block
CodecPushPixel called 64 times to fill
ibuffer with original block
CodecDoFdct called once to
transform 8 x 8 block

void CodecInitialize(void) { idx = 0; }

void CodecPushPixel(short p) {
if( idx == 64 ) idx = 0;
ibuffer[idx / 8][idx % 8] = p; idx++;
}
void CodecDoFdct(void) {
int x, y;
for(x=0; x<8; x++) {
for(y=0; y<8; y++)
obuffer[x][y] = FDCT(x, y, ibuffer);

Explained in next slide

}
idx = 0;

CodecPopPixel called 64 times to


retrieve encoded block from obuffer

}
short CodecPopPixel(void) {
short p;
if( idx == 64 ) idx = 0;
p = obuffer[idx / 8][idx % 8]; idx++;
return p;
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

23

CODEC (cont.)

Implementing FDCT formula


C(h) = if (h == 0) then 1/sqrt(2) else 1.0
F(u,v) = x C(u) x C(v) x=0..7 y=0..7 Dxy x
FRV [ X [FRV \ Y

static const short COS_TABLE[8][8] = {

Only 64 possible inputs to COS, so table can


be used to save performance time

Floating-point values multiplied by 32,678 and


rounded to nearest integer
32,678 chosen in order to store each value using
only 2 bytes of memory
Fixed-point representation explained more later

FDCT unrolls inner loop of summation,


implements outer summation as two
consecutive for loops

{ 32768,

32138,

30273,

27245,

{ 32768,

27245,

12539,

-6392, -23170, -32138, -30273, -18204 },

{ 32768,

18204, -12539, -32138, -23170,

{ 32768,
{ 32768,

6392, -30273, -18204,


-6392, -30273,

{ 32768, -18204, -12539,

18204,

23170,

23170,

12539,

{ 32768, -32138,

30273, -27245,

6392,

12539,

30273,

6392, -23170,

6392 },

27245 },

27245, -12539, -32138 },

23170, -27245, -12539,

32138, -23170,

{ 32768, -27245,

18204,

-6392,

32138, -30273,

23170, -18204,

32138 },

30273, -27245 },

12539,

18204 },
-6392 }

};
static int FDCT(int u, int v, short img[8][8]) {
double s[8], r = 0; int x;
for(x=0; x<8; x++) {
s[x] = img[x][0] * COS(0, v) + img[x][1] * COS(1, v) +

static short ONE_OVER_SQRT_TWO = 23170;


img[x][2] * COS(2, v) + img[x][3] * COS(3, v) +
static double COS(int xy, int uv) {
img[x][4] * COS(4, v) + img[x][5] * COS(5, v) +
return COS_TABLE[xy][uv] / 32768.0;
img[x][6] * COS(6, v) + img[x][7] * COS(7, v);
}
}
static double C(int h) {
for(x=0; x<8; x++) r += s[x] * COS(x, u);
return h ? 1.0 : ONE_OVER_SQRT_TWO / 32768.0;
return (short)(r * .25 * C(u) * C(v));
}
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

24

CNTRL (controller) module

Heart of the system


CntrlInitialize for consistency with other modules only
CntrlCaptureImage uses CCDPP module to input
image and place in buffer
CntrlCompressImage breaks the 64 x 64 buffer into 8 x
8 blocks and performs FDCT on each block using the
CODEC module
Also performs quantization on each block
CntrlSendImage transmits encoded image serially
using UART module

void CntrlSendImage(void) {
for(i=0; i<SZ_ROW; i++)
for(j=0; j<SZ_COL; j++) {
temp = buffer[i][j];
UartSend(((char*)&temp)[0]);
UartSend(((char*)&temp)[1]);
}
}
}

/* send upper byte */


/* send lower byte */

void CntrlCompressImage(void) {
for(i=0; i<NUM_ROW_BLOCKS; i++)
for(j=0; j<NUM_COL_BLOCKS; j++) {
for(k=0; k<8; k++)

void CntrlCaptureImage(void) {

for(l=0; l<8; l++)

CcdppCapture();

CodecPushPixel(

for(i=0; i<SZ_ROW; i++)

(char)buffer[i * 8 + k][j * 8 + l]);

for(j=0; j<SZ_COL; j++)

CodecDoFdct();/* part 1 - FDCT */

buffer[i][j] = CcdppPopPixel();

for(k=0; k<8; k++)

}
#define SZ_ROW

64

#define SZ_COL

64

#define NUM_ROW_BLOCKS

(SZ_ROW / 8)

#define NUM_COL_BLOCKS

(SZ_COL / 8)

for(l=0; l<8; l++) {


buffer[i * 8 + k][j * 8 + l] = CodecPopPixel();
/* part 2 - quantization */
buffer[i*8+k][j*8+l] >>= 6;
}

static short buffer[SZ_ROW][SZ_COL], i, j, k, l, temp;


void CntrlInitialize(void) {}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

}
}

25

Putting it all together

Main initializes all modules, then uses CNTRL module to capture,


compress, and transmit one image
Note: only for off-line test; no iterative real-time behavior ( no while(1) )

This system-level model can be used for extensive experimentation


Bugs much easier to correct here rather than in later models
int main(int argc, char *argv[]) {
char *uartOutputFileName = argc > 1 ? argv[1] : "uart_out.txt";
char *imageFileName = argc > 2 ? argv[2] : "image.txt";
/* initialize the modules */
UartInitialize(uartOutputFileName);
CcdInitialize(imageFileName);
CcdppInitialize();
CodecInitialize();
CntrlInitialize();
/* simulate functionality */
CntrlCaptureImage();
CntrlCompressImage();
CntrlSendImage();
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

26

Design

Determine systems architecture


Processors
Any combination of single-purpose (custom or standard) or general-purpose processors

Memories, buses

Map functionality to that architecture


Multiple functions on one processor
One function on one or more processors

Implementation
A particular architecture and mapping
Solution space is set of all implementations

Starting point
Low-end general-purpose processor connected to flash memory
All functionality mapped to software running on processor
Usually satisfies power, size, and time-to-market constraints
If timing constraint not satisfied then later implementations could:
use single-purpose processors for time-critical functions
rewrite functional specification

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

27

Implementation 1: Microcontroller alone

Low-end processor could be Intel 8051 microcontroller (core)


Total IC cost including (application) NRE about $5
Well below 200 mW power
Time-to-market about 3 months
However, one image per second not possible
12 MHz, 12 cycles per instruction
Executes one million instructions per second

CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations


~100 assembly instructions each iteration
409,000 (4096 x 100) instructions per image
Half of time budget for reading image alone

Would be over budget after adding compute-intensive DCT and Huffman


encoding
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

28

Implementation 2:
Microcontroller and CCDPP
EEPROM

SOC

UART

8051

RAM

CCDPP

CCDPP function implemented on custom single-purpose processor


Improves performance less microcontroller cycles
Increases NRE cost and time-to-market
Easy to implement
Simple datapath
Few states in controller

Simple UART easy to implement as standard single-purpose processor also


EEPROM for program memory and RAM for data memory added as well
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

29

Microcontroller

Synthesizable version of Intel 8051 available


Written in VHDL
Captured at register transfer level (RTL)

Fetches instruction from ROM


Decodes using Instruction Decoder
ALU executes arithmetic operations
Source and destination registers reside in
RAM

Block diagram of Intel 8051 processor core


4K ROM

Instruction
Decoder
Controller

128
RAM

ALU

Special data movement instructions used to


load and store externally
Special program generates VHDL description
of ROM from output of C compiler/linker

To External Memory Bus

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

30

UART
UART in idle mode until invoked
UART invoked when 8051 executes store instruction
with UARTs enable register as target address
Memory-mapped communication between 8051 and
all single-purpose processors
Lower 8-bits of memory address for RAM
Upper 8-bits of memory address for memory-mapped
I/O devices

Start state transmits 0 indicating start of byte


transmission then transitions to Data state
Data state sends 8 bits serially then transitions to
Stop state
Stop state transmits 1 indicating transmission done
then transitions back to idle mode
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

FSMD description of UART


invoked

Idle
:
I=0

I<8

Stop:
Transmi
t HIGH
I=8

Start:
Transmi
t LOW
Data:
Transmit
data(I),
then I++

31

CCDPP

Hardware implementation of zero-bias operations


Interacts with external CCD chip

CCD chip resides external to our SOC mainly because combining


CCD with ordinary logic not feasible

66 bytes: 64 pixels + 2 blacked-out pixels

FSMD description of CCDPP

Internal buffer, B, memory-mapped to 8051


Variables R, C are buffers row, column indices
GetRow state reads in one row from CCD to B
ComputeBias state computes bias for that row and
stores in variable Bias
FixBias state iterates over same row subtracting
Bias from each element
NextRow transitions to GetRow for repeat of
process on next row or to Idle state when all 64
rows completed

Idle:

GetRow:

invoked

B[R][C]=Pxl
C=C+1

R=0
C=0

C = 66

R = 64
R < 64

NextRow:

ComputeBias:
C < 64

R++
C=0
C = 64

C < 66

Bias=(B[R][11] +
B[R][10]) / 2
C=0

FixBias:
B[R][C]=B[R][C]-Bias

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

32

Connecting SOC components


Memory-mapped
All single-purpose processors and RAM are connected to 8051s memory bus

Read

Processor places address on 16-bit address bus


Asserts read control signal for 1 cycle
Reads data from 8-bit data bus 1 cycle later
Device (RAM or SPP) detects asserted read control signal
Checks address
Places and holds requested data on data bus for 1 cycle

Write

Processor places address and data on address and data bus


Asserts write control signal for 1 clock cycle
Device (RAM or SPP) detects asserted write control signal
Checks address bus
Reads and stores data from data bus

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

33

Software

System-level model provides majority of code


Module hierarchy, procedure names, and main program unchanged

Code for UART and CCDPP modules must be redesigned


Simply replace with memory assignments

xdata used to load/store variables over external memory bus


_at_ specifies memory address to store these variables
Byte sent to U_TX_REG by processor will invoke UART
U_STAT_REG used by UART to indicate its ready for next byte
UART may be much slower than processor

Similar modification for CCDPP code

All other modules untouched


Original code from system-level model

Rewritten UART module


static unsigned char xdata U_TX_REG _at_ 65535;
static unsigned char xdata U_STAT_REG _at_ 65534;
void UARTInitialize(void) {}
void UARTSend(unsigned char d) {
while( U_STAT_REG == 1 ) {
/* busy wait */
}
U_TX_REG = d;
}

#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, "w");
}
void UartSend(char d) {
fprintf(outputFileHandle, "%i\n", (int)d);
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

34

Analysis

Entire SOC tested on VHDL simulator


Interprets VHDL descriptions and
functionally simulates execution of system
Recall program code translated to VHDL
description of ROM

Tests for correct functionality


Measures clock cycles to process one
image (performance)

Gate-level description obtained through


synthesis
Synthesis tool like compiler for SPPs
Simulate gate-level models to obtain data
for power analysis

Obtaining design metrics of interest


VHDL

VHDL

VHDL

VHDL
simulator

Power
equation
Synthesis
tool
Gate level
simulator

gates

Execution time

gates

gates

Sum gates

Power
Chip area

Number of times gates switch from 1 to 0


or 0 to 1

Count number of gates for chip area


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

35

Implementation 2:
Microcontroller and CCDPP
Analysis of implementation 2
Total execution time for processing one image:
9.1 seconds

Power consumption:
0.033 watt

Energy consumption:
0.30 joule (9.1 s x 0.033 watt)

Total chip area:


98,000 gates

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

36

Implementation 3: Microcontroller and


CCDPP/Fixed-Point DCT
9.1 seconds still doesnt meet performance constraint
of 1 second
DCT operation prime candidate for improvement
Execution of implementation 2 shows microprocessor
spends most cycles here
Could design custom hardware like we did for CCDPP
More complex so more design effort

Instead, will speed up DCT functionality by modifying


behavior

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

37

DCT floating-point cost


Floating-point cost

DCT uses ~260 floating-point operations per pixel transformation


4096 (64 x 64) pixels per image
1 million floating-point operations per image
No floating-point support with Intel 8051
Compiler must emulate
Generates procedures for each floating-point operation
mult, add
Each procedure uses tens of integer operations

Thus, > 10 million integer operations per image


Procedures increase code size

Fixed-point arithmetic can improve on this


Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

38

Fixed-point arithmetic
Integer used to represent a real number
Constant number of integers bits represents fractional portion of real number
More bits, more accurate the representation

Remaining bits represent portion of real number before decimal point

Translating a real constant to a fixed-point representation


Multiply real value by 2 ^ (# of bits used for fractional part)
Round to nearest integer
E.g., represent 3.14 as 8-bit integer with 4 bits for fraction

2^4 = 16
3.14 x 16 = 50.24  
16 (2^4) possible values for fraction, each represents 0.0625 (1/16)
Last 4 bits (0010) = 2
2 x 0.0625 = 0.125
3(0011) + 0.125 = 3.125  PRUHELWVIRUIUDFWLRQZRXOGLQFUHDVHDFFXUDF\

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

39

Fixed-point arithmetic operations


Addition
Simply add integer representations
E.g., 3.14 + 2.71 = 5.85
3.14 50 = 0011.0010
2.71 43 = 0010.1011
50 + 43 = 93 = 0101.1101
5(0101) + 13(1101) x 0.0625 = 5.8125 5.85

Multiply
Multiply integer representations
Shift result right by # of bits in fractional part
E.g., 3.14 * 2.71 = 8.5094
50 * 43 = 2150 = 1000.01100110
[ = (3.14*16) * (2.71*16) = (3.14*2.71*16) *16 ]
>> 4 = 1000.0110
8(1000) + 6(0110) x 0.0625 = 8.375 

Range of real values used is limited by bit widths of possible resulting values
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

40

Fixed-point implementation of CODEC


COS_TABLE gives 8-bit fixed-point
representation of cosine values

static const char code COS_TABLE[8][8] = {

6 bits used for fractional portion


Result of multiplications shifted right
by 6
static unsigned char C(int h) { return h ? 64 : ONE_OVER_SQRT_TWO;}
static int F(int u, int v, short img[8][8]) {
long s[8], r = 0;

64,

62,

59,

53,

45,

35,

24,

12 },

64,

53,

24,

-12,

-45,

-62,

-59,

-35 },

64,

35,

-24,

-62,

-45,

12,

59,

53 },

64,

12,

-59,

-35,

45,

53,

-24,

-62 },

64,

-12,

-59,

35,

45,

-53,

-24,

62 },

64,

-35,

-24,

62,

-45,

-12,

59,

-53 },

64,

-53,

24,

12,

-45,

62,

-59,

64,

-62,

59,

-53,

45,

-35,

24,

35 },
-12 }

};
static const char ONE_OVER_SQRT_TWO = 5;
static short xdata inBuffer[8][8], outBuffer[8][8], idx;
void CodecInitialize(void) { idx = 0; }
void CodecPushPixel(short p) {

unsigned char x, j;

if( idx == 64 ) idx = 0;

for(x=0; x<8; x++) {


s[x] = 0;

inBuffer[idx / 8][idx % 8] = p << 6; idx++;


}

for(j=0; j<8; j++)


s[x] += (img[x][j] * COS_TABLE[j][v] ) >> 6;
}
for(x=0; x<8; x++) r += (s[x] * COS_TABLE[x][u]) >> 6;
return (short)((((r * (((16*C(u)) >> 6) *C(v)) >> 6)) >> 6) >> 6);
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

void CodecDoFdct(void) {
unsigned short x, y;
for(x=0; x<8; x++)
for(y=0; y<8; y++)
outBuffer[x][y] = F(x, y, inBuffer);
idx = 0;
}

41

Implementation 3: Microcontroller and


CCDPP/Fixed-Point DCT
Analysis of implementation 3
Use same analysis techniques as implementation 2
Total execution time for processing one image:
1.5 seconds

Power consumption:
0.033 watt (same as 2)

Energy consumption:
0.050 joule (1.5 s x 0.033 watt)
Battery life 6x longer!!

Total chip area:


90,000 gates
8,000 less gates (less memory needed for code)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

42

Implementation 4:
Microcontr. and CCDPP/DCT and CODEC
EEPROM

SOC

CODEC

RAM

8051

UART

CCDP
P

Performance close but not good enough


Must resort to implementing CODEC in hardware
Single-purpose processor to perform DCT on 8 x 8 block

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

43

CODEC design
4 memory mapped registers
C_DATAI_REG/C_DATAO_REG used to
push/pop 8 x 8 block into and out of
CODEC
C_CMND_REG used to command
CODEC
Writing 1 to this register invokes CODEC

C_STAT_REG indicates CODEC done


and ready for next block
Polled in software

Direct translation of C code to VHDL for


actual hardware implementation
Fixed-point version used

CODEC module in software changed


similar to UART/CCDPP in
implementation 2

Rewritten CODEC software


static unsigned char xdata C_STAT_REG _at_ 65527;
static unsigned char xdata C_CMND_REG _at_ 65528;
static unsigned char xdata C_DATAI_REG _at_ 65529;
static unsigned char xdata C_DATAO_REG _at_ 65530;
void CodecInitialize(void) {}
void CodecPushPixel(short p) { C_DATAO_REG = (char)p; }
short CodecPopPixel(void) {
return ((C_DATAI_REG << 8) | C_DATAI_REG);
}
void CodecDoFdct(void) {
C_CMND_REG = 1;
while( C_STAT_REG == 1 ) { /* busy wait */ }
}

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

44

Implementation 4:
Microcontr. and CCDPP/DCT and CODEC
Analysis of implementation 4
Total execution time for processing one image:
0.099 seconds (well under 1 sec)

Power consumption:
0.040 watt
Increase over 2 and 3 because SOC has another processor

Energy consumption:
0.00040 joule (0.099 s x 0.040 watt)
Battery life 12x longer than previous implementation!!

Total chip area:


128,000 gates
Significant increase over previous implementations
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

45

Summary of implementations
Performance (second)
Power (watt)
Size (gate)
Energy (joule)

Implementation 2 Implementation 3 Implementation 4


9.1
1.5
0.099
0.033
0.033
0.040
98,000
90,000
128,000
0.30
0.050
0.0040

Implementation 3
Close in performance
Cheaper
Less time to build

Implementation 4
Great performance and energy consumption
More expensive and may miss time-to-market window
If DCT designed ourselves then increased NRE cost and time-to-market
If existing DCT purchased then increased IC cost (IP royalties)

Which is better?
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

46

Summary
Digital camera example
Specifications in English and executable language
Design metrics: performance, power and area

Several implementations

Microcontroller: too slow


Microcontroller and coprocessor: better, but still too slow
Fixed-point arithmetic: almost fast enough
Additional coprocessor for compression: fast enough, but
expensive and hard to design
Tradeoffs between hw/sw this is the main design concern
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis

47

Introduction to VHDL

Slides adapted from the


Introduction to VLSI course
GM University, VA, USA

.. .

VHDL
VHDL is a language for describing digital
hardware used by industry worldwide

VHDL is an acronym for VHSIC (Very High


Speed Integrated Circuit) Hardware
Description Language

.. .

Genesis of VHDL
State of art circa 1980
Multiple design entry methods and
hardware description languages in use
No or limited portability of designs
between CAD tools from different vendors
Objective: shortening the time from a
design concept to implementation from
18 months to 6 months

.. .

A Brief History of VHDL


June 1981: Woods Hole Workshop
July 1983: contract awarded to develop VHDL
Intermetrics
IBM
Texas Instruments
August 1985: VHDL Version 7.2 released
December 1987:
VHDL became IEEE Standard 1076-1987 and in
1988 an ANSI standard
.. .

Three versions of VHDL

VHDL-87
VHDL-93
VHDL-01

.. .

.. .

VHDL for Specification

VHDL for Simulation

VHDL for Synthesis

Levels of design description

Algorithmic level
Register Transfer Level

Level of description
most suitable for synthesis

Logic (gate) level


Circuit (transistor) level
Physical (layout) level

.. .

Register Transfer Logic (RTL) Design Description

Combinational
Logic

Combinational
Logic

Registers

.. .

Naming and Labeling (1)


VHDL is not case sensitive
Example:
Names or labels
databus
Databus
DataBus
DATABUS
are all equivalent

.. .

Naming and Labeling (2)


General rules of thumb (according to VHDL-87)
1.
2.
3.
4.
5.

All names should start with an alphabet character (a-z


or A-Z)
Use only alphabet characters (a-z or A-Z) digits (0-9)
and underscore (_)
Do not use any punctuation or reserved characters
within a name (!, ?, ., &, +, -, etc.)
Do not use two or more consecutive underscore
characters (__) within a name (e.g., Sel__A is invalid)
All names and labels in a given entity and architecture
must be unique
.. .

10

Free Format
VHDL is a free format language
No formatting conventions, such as spacing or
indentation imposed by VHDL compilers. Space
and carriage return treated the same way.
Example:
if (a=b) then

or
if (a=b)

then

or
if (a =
b) then

are all equivalent


.. .

11

Comments
Comments in VHDL are indicated with
a double dash, i.e., --
Comment indicator can be placed anywhere in the
line
Any text that follows in the same line is treated as
a comment
Carriage return terminates a comment
No method for commenting a block extending over
a couple of lines
Examples:
-- main subcircuit
Data_in <= Data_bus; -- reading data from the input FIFO
.. .

12

Design Entity
design entity

entity declaration

architecture 1

Design Entity - most basic


building block of a design.
One entity can have
many different architectures.

architecture 2
architecture 3

.. .

13

Entity Declaration
Entity Declaration describes the interface of the
component, i.e. input and output ports.

Entity name

Port names

Port type

ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC
);
END nand_gate;

Reserved words

Semicolon

No Semicolon

Port modes (data flow directions)


.. .

14

Entity declaration simplified syntax

ENTITY entity_name IS
PORT (
port_name : signal_mode signal_type;
port_name : signal_mode signal_type;
.
port_name : signal_mode signal_type);
END entity_name;

.. .

15

Architecture
Describes an implementation of a design
entity.
Architecture example:

ARCHITECTURE model OF nand_gate IS


BEGIN
z <= a NAND b;
END model;

.. .

16

Architecture simplified syntax

ARCHITECTURE architecture_name OF entity_name IS


[ declarations ]
BEGIN
code
END architecture_name;

.. .

17

Entity Declaration & Architecture


nand_gate.vhd
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC);
END nand_gate;
ARCHITECTURE model OF nand_gate IS
BEGIN
z <= a NAND b;
END model;

.. .

18

Mode In
Port signal

Entity

Driver resides
outside the entity

.. .

19

.. .

20

Mode out
Entity
Port signal

Driver resides
inside the entity

Cant read out


within an entity

c <= z

Mode out with signal


Entity
Port signal

Signal X can be
read inside the entity

Driver resides
inside the entity

z <= x
c <= x

.. .

21

Mode inout
Entity

Port signal

Signal can be
read inside the entity

Driver may reside


both inside and outside
of the entity

.. .

22

Mode buffer
Entity
Port signal

z
c
Driver resides
inside the entity

Port signal Z can be


read inside the entity

c <= z

.. .

23

Port Modes
The Port Mode of the interface describes the direction in which data travels with
respect to the component
In: Data comes in this port and can only be read within the entity. It can
appear only on the right side of a signal or variable assignment.
Out: The value of an output port can only be updated within the entity. It
cannot be read. It can only appear on the left side of a signal
assignment.
Inout: The value of a bi-directional port can be read and updated within
the entity model. It can appear on both sides of a signal assignment.
Buffer: Used for a signal that is an output from an entity. The value of the
signal can be used inside the entity, which means that in an assignment
statement the signal can appear on the left and right sides of the <=
operator
.. .

24

Library declarations
Library declaration
Use all definitions from the package
LIBRARY ieee;
std_logic_1164
USE ieee.std_logic_1164.all;
ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC);
END nand_gate;
ARCHITECTURE model OF nand_gate IS
BEGIN
z <= a NAND b;
END model;

.. .

25

Library declarations - syntax

LIBRARY library_name;
USE library_name.package_name.package_parts;

.. .

26

Fundamental parts of a library


LIBRARY
PACKAGE 1

PACKAGE 2

TYPES
CONSTANTS
FUNCTIONS
PROCEDURES
COMPONENTS

TYPES
CONSTANTS
FUNCTIONS
PROCEDURES
COMPONENTS

.. .

27

Libraries
ieee
Specifies multi-level logic system,
including STD_LOGIC, and
STD_LOGIC_VECTOR data types

Need to be explicitly
declared

std
Specifies pre-defined data types
(BIT, BOOLEAN, INTEGER, REAL,
SIGNED, UNSIGNED, etc.), arithmetic
operations, basic type conversion
functions, basic text i/o functions, etc.

Visible by default

work
Current designs after compilation
.. .

28

STD_LOGIC
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC);
END nand_gate;
ARCHITECTURE model OF nand_gate IS
BEGIN
z <= a NAND b;
END model;

What is STD_LOGIC you ask?


.. .

29

STD_LOGIC type demystified


Value

Meaning

Forcing (Strong driven) Unknown

Forcing (Strong driven) 0

Forcing (Strong driven) 1

High Impedance

Weak (Weakly driven) Unknown

Weak (Weakly driven) 0.


Models a pull down.

Weak (Weakly driven) 1.


Models a pull up.

Don't Care

.. .

30

More on STD_LOGIC Meanings (1)

X
Contention on the bus

X
0

.. .

31

More on STD_LOGIC Meanings (2)

0
0

.. .

32

More on STD_LOGIC Meanings (3)


VDD
VDD

H
1
0

.. .

33

More on STD_LOGIC Meanings (4)


-

Do not care.
Can be assigned to outputs for the case of invalid
inputs(may produce significant improvement in
resource utilization after synthesis).
Use with caution
1 = - give FALSE

.. .

34

Resolving logic levels

X
0
1
Z
W
L
H
-

X
X
X
X
X
X
X
X

X
0
X
0
0
0
0
X

X
X
1
1
1
1
1
X

X
0
1
Z
W
L
H
X

X
0
1
W
W
W
W
X

X
0
1
L
W
L
W
X

X
0
1
H
W
W
H
X

X
X
X
X
X
X
X
X

.. .

35

Signals
SIGNAL a : STD_LOGIC;

a
1

wire

SIGNAL b : STD_LOGIC_VECTOR(7 DOWNTO 0);

b
8

bus

.. .

36

Standard Logic Vectors


SIGNAL a: STD_LOGIC;
SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL c: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL d: STD_LOGIC_VECTOR(7 DOWNTO 0);
SIGNAL e: STD_LOGIC_VECTOR(15 DOWNTO 0);
SIGNAL f: STD_LOGIC_VECTOR(8 DOWNTO 0);
.
a <= 1;
b <= 0000;
-- Binary base assumed by default
c <= B0000;
-- Binary base explicitly specified
d <= 0110_0111; -- You can use _ to increase readability
e <= XAF67;
-- Hexadecimal base
f <= O723;
-- Octal base
.. .

37

Vectors and Concatenation


SIGNAL a: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL c, d, e: STD_LOGIC_VECTOR(7 DOWNTO 0);
a <= 0000;
b <= 1111;
c <= a & b;

-- c = 00001111

d <= 0 & 0001111;

-- d <= 00001111

e <= 0 & 0 & 0 & 0 & 1 & 1 &


1 & 1;
-- e <= 00001111

.. .

38

VHDL Design Styles


VHDL Design
Styles

structural

dataflow
Concurrent
statements

Components and
interconnects

behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.

Subset most suitable for synthesis


.. .

39

.. .

40

xor3 Example

Entity xor3
ENTITY xor3
PORT(
A : IN
B : IN
C : IN
Result
);
end xor3;

IS
STD_LOGIC;
STD_LOGIC;
STD_LOGIC;
: OUT STD_LOGIC

.. .

41

Dataflow Architecture (xor3 gate)


ARCHITECTURE dataflow OF xor3 IS
SIGNAL U1_out: STD_LOGIC;
BEGIN
U1_out <=A XOR B;
Result <=U1_out XOR C;
END dataflow;
U1_out

.. .

42

Dataflow Description
Describes how data moves through the system
and the various processing steps.
Data Flow uses series of concurrent statements
to realize logic. Concurrent statements are
evaluated at the same time; thus, order of these
statements doesnt matter.
Data Flow is most useful style when series of
Boolean equations can represent a logic.

43

.. .

Structural Architecture (xor3 gate)


I1
I2

Y
XOR2

ARCHITECTURE structural OF xor3 IS


SIGNAL U1_OUT: STD_LOGIC;
COMPONENT xor2 IS
PORT(
I1 : IN STD_LOGIC;
I2 : IN STD_LOGIC;
Y : OUT STD_LOGIC
);
END COMPONENT;
BEGIN
U1: xor2 PORT MAP (I1 => A,
I2 => B,
Y => U1_OUT);

A
B
C

Result

XOR3

U1_OUT

A
B

RESULT

XOR3

U2: xor2 PORT MAP (I1 => U1_OUT,


I2 => C,
Y => Result);
END structural;

.. .

44

Component and Instantiation (1)


Named association connectivity
(recommended)
COMPONENT xor2 IS
PORT(
I1 : IN STD_LOGIC;
I2 : IN STD_LOGIC;
Y : OUT STD_LOGIC
);
END COMPONENT;
U1: xor2 PORT MAP (I1 => A,
I2 => B,
Y => U1_OUT);

.. .

45

Component and Instantiation (2)


Positional association connectivity
(not recommended)
COMPONENT xor2 IS
PORT(
I1 : IN STD_LOGIC;
I2 : IN STD_LOGIC;
Y : OUT STD_LOGIC
);
END COMPONENT;
U1: xor2 PORT MAP (A, B, U1_OUT);

.. .

46

Structural Description
Structural design is the simplest to understand.
This style is the closest to schematic capture and
utilizes simple building blocks to compose logic
functions.
Components are interconnected in a hierarchical
manner.
Structural descriptions may connect simple gates
or complex, abstract components.
Structural style is useful when expressing a
design that is naturally composed of sub-blocks.
.. .

47

Behavioral Architecture (xor3 gate)


ARCHITECTURE behavioral OF xor3 IS
BEGIN
xor3_behave: PROCESS (A,B,C)
BEGIN
IF ((A XOR B XOR C) = '1') THEN
Result <= '1';
ELSE
Result <= '0';
END IF;
END PROCESS xor3_behave;
END behavioral;
.. .

48

Behavioral Description
It accurately models what happens on the inputs
and outputs of the black box (no matter what is
inside and how it works).
This style uses PROCESS statements in VHDL.

.. .

49

Testbench Block Diagram

Testbench

Processes
Generating

Design Under
Test (DUT)

Stimuli

Observed Outputs

.. .

50

Testbench Defined
Testbench applies stimuli (drives the inputs) to
the Design Under Test (DUT) and (optionally)
verifies expected outputs.
The results can be viewed in a waveform window
or written to a file.
Since Testbench is written in VHDL, it is not
restricted to a single simulation tool (portability).
The same Testbench can be easily adapted to
test different implementations (i.e. different
architectures) of the same design.
.. .

51

Testbench Anatomy
ENTITY tb IS
--TB entity has no ports
END tb;
ARCHITECTURE arch_tb OF tb IS
--Local signals and constants
COMPONENT TestComp --All Design Under Test component declarations
PORT ( );
END COMPONENT;
----------------------------------------------------BEGIN
testSequence: PROCESS
-- Input stimuli
END PROCESS;
DUT:TestComp PORT MAP(
);
END arch_tb;

-- Instantiations of DUTs

.. .

52

Testbench for XOR3 (1)


LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY xor3_tb IS
END xor3_tb;
ARCHITECTURE xor3_tb_architecture OF xor3_tb IS
-- Component declaration of the tested unit
COMPONENT xor3
PORT(
A : IN STD_LOGIC;
B : IN STD_LOGIC;
C : IN STD_LOGIC;
Result : OUT STD_LOGIC );
END COMPONENT;
-- Stimulus signals - signals mapped to the input and inout ports of tested entity
SIGNAL test_vector: STD_LOGIC_VECTOR(2 DOWNTO 0);
SIGNAL test_result : STD_LOGIC;

.. .

53

Testbench for XOR3 (2)


BEGIN
UUT : xor3
PORT MAP (
A => test_vector(0),
B => test_vector(1),
C => test_vector(2),
Result => test_result);
);
Testing: PROCESS
BEGIN
test_vector <= "000";
WAIT FOR 10 ns;
test_vector <= "001";
WAIT FOR 10 ns;
test_vector <= "010";
WAIT FOR 10 ns;
test_vector <= "011";
WAIT FOR 10 ns;
test_vector <= "100";
WAIT FOR 10 ns;
test_vector <= "101";
WAIT FOR 10 ns;
test_vector <= "110";
WAIT FOR 10 ns;
test_vector <= "111";
WAIT FOR 10 ns;
END PROCESS;
END xor3_tb_architecture;
.. .

54

Constants
Syntax:
CONSTANT name : type := value;

Examples:
CONSTANT init_value : STD_LOGIC_VECTOR(3 downto 0) := "0100";
CONSTANT ANDA_EXT : STD_LOGIC_VECTOR(7 downto 0) := X"B4";
CONSTANT counter_width : INTEGER := 16;
CONSTANT buffer_address : INTEGER := 16#FFFE#;
CONSTANT clk_period : TIME := 20 ns;
CONSTANT strobe_period : TIME := 333.333 ms;

.. .

55

Constants - features
Constants can be declared in a
PACKAGE, ENTITY, ARCHITECTURE
When declared in a PACKAGE, the constant
is truly global, for the package can be used
in several entities.
When declared in an ARCHITECTURE, the
constant is local, i.e., it is visible only within this architecture.
When declared in an ENTITY declaration, the constant
can be used in all architectures associated with this entity.
.. .

56

Physical data types


Types representing physical quantities,
such as time, voltage, capacitance, etc. are
referred in VHDL as physical data types.
TIME is the only predefined physical data
type.
Value of the physical data type is called a
physical literal.
.. .

57

Time values (physical literals) - Examples


7 ns
1 min
min
10.65 us
10.65 fs
Numeric value

Space

Unit of time
(dimension)
.. .

58

TIME values
Numeric value can be an integer or
a floating point number.
Numeric value is optional. If not given, 1 is
implied.
Numeric value and dimension MUST be
separated by a space.

.. .

59

Units of time
Unit
Base Unit
fs
Derived Units
ps
ns
us
ms
sec
min
hr

Definition
femtoseconds (10-15 seconds)
picoseconds (10-12 seconds)
nanoseconds (10-9 seconds)
microseconds (10-6 seconds)
miliseconds (10-3 seconds)
seconds
minutes (60 seconds)
hours (3600 seconds)
.. .

60

Values of the type TIME


Value of a physical literal is defined in terms
of integral multiples of the base unit, e.g.
10.65 us = 10,650,000,000 fs
10.65 fs = 10 fs
Smallest available resolution in VHDL is 1 fs.
Smallest available resolution in simulation can be
set using a simulator command or parameter.
.. .

61

Arithmetic operations on values of the


type TIME
Examples:
7 ns + 10 ns = 17 ns
1.2 ns 12.6 ps = 1187400 fs
5 ns * 4.3 = 21.5 ns
20 ns / 5ns = 4

.. .

62

VHDL Design Styles


VHDL Design
Styles

dataflow
Concurrent
statements

structural
Components and
interconnects

behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.

.. .

63

Data-flow VHDL
Major instructions
Concurrent statements

concurrent signal assignment ()


conditional concurrent signal assignment
(when-else)
selected concurrent signal assignment
(with-select-when)
generate scheme for equations
(for-generate)

.. .

64

Data-flow VHDL
Major instructions
Concurrent statements

concurrent signal assignment ()


conditional concurrent signal assignment
(when-else)
selected concurrent signal assignment
(with-select-when)
generate scheme for equations
(for-generate)

.. .

65

Data-flow VHDL: Example (Full adder)


xiyi
ci
ci xi yi

ci + 1

si

0
0
0
1
0
1
1
1

0
1
1
0
1
0
0
1

00

1
0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

01

11

0
1

10
1

s i = x i  y i  c i
xiyi
ci

00

01

11

0
1

(a) Truth table

10

ci + 1 = xi yi + xici + yi ci

(b) Karnaugh maps


xi
yi

si

ci

ci + 1

(c) Circuit

.. .

66

Data-flow VHDL: Example (1)

LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY fulladd IS
PORT ( x
: IN
y
: IN
cin
: IN
s
: OUT
cout : OUT
END fulladd ;

STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ) ;

.. .

67

Data-flow VHDL: Example (2)

ARCHITECTURE fulladd_dataflow OF fulladd IS


BEGIN
s <= x XOR y XOR cin ;
cout <= (x AND y) OR (cin AND x) OR (cin AND y) ;
END fulladd_dataflow ;

.. .

68

Logic Operators
Logic operators
and

or

nand

nor

xor

not

Logic operators precedence

xnor

only in VHDL-93

Highest
and

or

not
nand
nor

xor

xnor

Lowest

.. .

69

.. .

70

No Implied Precedence
Wanted: y = ab + cd
Incorrect
y <= a and b or c and d ;
equivalent to
y <= ((a and b) or c) and d ;
equivalent to
y = (ab + c)d
Correct
y <= (a and b) or (c and d) ;

Concatenation
SIGNAL a: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL c, d, e, f: STD_LOGIC_VECTOR(7 DOWNTO 0);
a <= 0000;
b <= 1111;
c <= a & b;

-- c = 00001111

d <= 0 & 0001111;

-- d <= 00001111

e <= 0 & 0 & 0 & 0 & 1 & 1 &


1 & 1;
-- e <= 00001111
f <= (0,0,0,0,1,1,1,1) ;
-- f <= 00001111
.. .

71

.. .

72

Rotations in VHDL
a<<<1
a(3) a(2)

a(1)

a(0)

a(2) a(1) a(0) a(3)

a_rotL <= a(2 downto 0) & a(3)

Arithmetic Operators in VHDL (1)


To use basic arithmetic operations involving
std_logic_vectors you need to include the
following library packages:
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_unsigned.all;
or
USE ieee.std_logic_signed.all;

.. .

73

Arithmetic Operators in VHDL (2)


You can use standard +, - operators
to perform addition and subtraction:
signal A :
signal B :
signal C :

STD_LOGIC_VECTOR(3 downto 0);


STD_LOGIC_VECTOR(3 downto 0);
STD_LOGIC_VECTOR(3 downto 0);

C <= A + B;

.. .

74

Data-flow VHDL
Major instructions
Concurrent statements

concurrent signal assignment ()


conditional concurrent signal assignment
(when-else)
selected concurrent signal assignment
(with-select-when)
generate scheme for equations
(for-generate)

75

.. .

Conditional concurrent signal assignment


When - Else
target_signal <= value1 when condition1 else
value2 when condition2 else
. . .
valueN-1 when conditionN-1 else
valueN;

Value N
Value N-1

0
1

0
1

0
1

Value 2

Target Signal

Value 1
Condition N-1
Condition 2
Condition 1
.. .

76

Operators
Relational operators
=

/=

<

<=

>

>=

Logic and relational operators precedence


Highest
Lowest

=
and

/=
or

not
<
<=
nand
nor

>
xor

>=
xnor

.. .

77

Priority of logic and relational operators


compare a = bc
Incorrect
when a = b and c else
equivalent to
when (a = b) and c else
Correct
when a = (b and c) else

.. .

78

Tri-state Buffer example (1)


LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY tri_state IS
PORT ( enable: IN STD_LOGIC;
input: IN STD_LOGIC_VECTOR(7 downto 0);
output: OUT STD_LOGIC_VECTOR (7 DOWNTO 0)
);
END tri_state;

.. .

79

Tri-state Buffer example (2)


ARCHITECTURE tri_state_dataflow OF tri_state IS
BEGIN
output <= input WHEN (enable = 0) ELSE
(OTHERS => Z);
END tri_state_dataflow;

.. .

80

Data-flow VHDL
Major instructions
Concurrent statements

concurrent signal assignment ()


conditional concurrent signal assignment
(when-else)
selected concurrent signal assignment
(with-select-when)
generate scheme for equations
(for-generate)

.. .

81

Selected concurrent signal assignment


With Select-When
with choice_expression select
target_signal <= expression1 when choices_1,
expression2 when choices_2,
. . .
expressionN when choices_N;

expression1

choices_1

expression2

choices_2
target_signal

expressionN

choices_N
choice expression
.. .

82

Allowed formats of choices_k

WHEN value
WHEN value_1 to value_2
WHEN value_1 | value_2 | .... | value N

.. .

83

Allowed formats of choice_k - example

WITH sel SELECT


y <= a WHEN "000",
b WHEN "011" to "110",
c WHEN "001" | "111",
d WHEN OTHERS;

.. .

84

MLU: Block Diagram


MUX_0
A1

IN0

NEG_A

MUX_1

IN1

MUX_2

Y1

IN2
IN3

OUTPUT

SEL1
SEL0

B1

NEG_Y

MUX_4_1
MUX_3

NEG_B
L1 L0

.. .

85

MLU: Entity Declaration


LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY mlu IS
PORT(
NEG_A : IN STD_LOGIC;
NEG_B : IN STD_LOGIC;
NEG_Y : IN STD_LOGIC;
A:
IN STD_LOGIC;
B:
IN STD_LOGIC;
L1 :
IN STD_LOGIC;
L0 :
IN STD_LOGIC;
Y:
OUT STD_LOGIC
);
END mlu;

.. .

86

MLU: Architecture Declarative Section


ARCHITECTURE mlu_dataflow OF mlu IS
SIGNAL
SIGNAL
SIGNAL
SIGNAL
SIGNAL
SIGNAL
SIGNAL
SIGNAL

A1 : STD_LOGIC;
B1 : STD_LOGIC;
Y1 : STD_LOGIC;
MUX_0 : STD_LOGIC;
MUX_1 : STD_LOGIC;
MUX_2 : STD_LOGIC;
MUX_3 : STD_LOGIC;
L: STD_LOGIC_VECTOR(1 DOWNTO 0);

.. .

87

.. .

88

MLU - Architecture Body


BEGIN
A1<= NOT A WHEN (NEG_A='1') ELSE
A;
B1<= NOT B WHEN (NEG_B='1') ELSE
B;
Y <= NOT Y1 WHEN (NEG_Y='1') ELSE
Y1;
MUX_0 <= A1
MUX_1 <= A1
MUX_2 <= A1
MUX_3 <= A1

AND B1;
OR B1;
XOR B1;
XNOR B1;

L <= L1 & L0;


with (L) select
Y1 <= MUX_0
MUX_1
MUX_2
MUX_3

WHEN "00",
WHEN "01",
WHEN "10",
WHEN OTHERS;

END mlu_dataflow;

Data-flow VHDL
Major instructions
Concurrent statements

concurrent signal assignment ()


conditional concurrent signal assignment
(when-else)
selected concurrent signal assignment
(with-select-when)
generate scheme for equations
(for-generate)

.. .

89

.. .

90

For Generate Statement


For - Generate
label: FOR identifier IN range GENERATE
BEGIN
{Concurrent Statements}
END GENERATE;

PARITY: Block Diagram

.. .

91

PARITY: Entity Declaration


LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY parity IS
PORT(
parity_in : IN STD_LOGIC_VECTOR(7 DOWNTO 0);
parity_out : OUT STD_LOGIC
);
END parity;

.. .

92

PARITY: Block Diagram


xor_out(1)

xor_out(2)

xor_out(3)

xor_out(4)

xor_out(5) xor_out(6)

.. .

93

.. .

94

PARITY: Architecture
ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: std_logic_vector (6 downto 1);
BEGIN
xor_out(1) <= parity_in(0) XOR parity_in(1);
xor_out(2) <= xor_out(1) XOR parity_in(2);
xor_out(3) <= xor_out(2) XOR parity_in(3);
xor_out(4) <= xor_out(3) XOR parity_in(4);
xor_out(5) <= xor_out(4) XOR parity_in(5);
xor_out(6) <= xor_out(5) XOR parity_in(6);
parity_out <= xor_out(6) XOR parity_in(7);

END parity_dataflow;

PARITY: Block Diagram (2)


xor_out(0)

xor_out(1)

xor_out(2)

xor_out(3)

xor_out(4)

xor_out(5) xor_out(6)

xor_out(7)

.. .

95

.. .

96

PARITY: Architecture
ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: STD_LOGIC_VECTOR (7 downto 0);
BEGIN
xor_out(0) <= parity_in(0);
xor_out(1) <= xor_out(0) XOR parity_in(1);
xor_out(2) <= xor_out(1) XOR parity_in(2);
xor_out(3) <= xor_out(2) XOR parity_in(3);
xor_out(4) <= xor_out(3) XOR parity_in(4);
xor_out(5) <= xor_out(4) XOR parity_in(5);
xor_out(6) <= xor_out(5) XOR parity_in(6);
xor_out(7) <= xor_out(6) XOR parity_in(7);
parity_out <= xor_out(7);

END parity_dataflow;

PARITY: Architecture (2)


ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: STD_LOGIC_VECTOR (7 DOWNTO 0);
BEGIN
xor_out(0) <= parity_in(0);
G2: FOR i IN 1 TO 7 GENERATE
xor_out(i) <= xor_out(i-1) XOR parity_in(i);
end generate G2;
parity_out <= xor_out(7);
END parity_dataflow;

.. .

97

Left vs. right side of the assignment


Left side

<=
Right side
<= when-else
with-select <=

Internal signals (defined


in a given architecture)
Ports of the mode
- out
- inout
- buffer

Expressions including:
Internal signals (defined
in a given architecture)
Ports of the mode
- in
- inout
- buffer
.. .

98

Arithmetic operations
Synthesizable arithmetic operations:
Addition, +
Subtraction, Comparisons, >, >=, <, <=
Multiplication, *
Division by a power of 2, /2**6
(equivalent to right shift)
Shifts by a constant, SHL, SHR
.. .

99

Arithmetic operations
The result of synthesis of an arithmetic
operation is a
- combinational circuit
- without pipelining.
The exact internal architecture used
(and thus delay and area of the circuit)
may depend on the timing constraints specified
during synthesis (e.g., the requested maximum
clock frequency).
.. .

100

Operations on Unsigned Numbers


For operations on unsigned numbers
USE ieee.std_logic_unsigned.all
and
signals (inputs/outputs) of the type
STD_LOGIC_VECTOR
OR
USE ieee.std_logic_arith.all
and
signals (inputs/outputs) of the type
UNSIGNED
.. .

101

Operations on Signed Numbers


For operations on signed numbers
USE ieee.std_logic_signed.all
and
signals (inputs/outputs) of the type
STD_LOGIC_VECTOR
OR
USE ieee.std_logic_arith.all
and
signals (inputs/outputs) of the type
SIGNED
.. .

102

Signed and Unsigned Types


Behave exactly like
STD_LOGIC_VECTOR
plus, they determine whether a given vector
should be treated as a signed or unsigned number.
Require
USE ieee.std_logic_arith.all;

.. .

103

Addition of Unsigned Numbers


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_unsigned.all ;
ENTITY adder16 IS
PORT ( Cin
X, Y
S
Cout
END adder16 ;

: IN
: IN
: OUT
: OUT

STD_LOGIC ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC ) ;

ARCHITECTURE Behavior OF adder16 IS


SIGNAL Sum : STD_LOGIC_VECTOR(16 DOWNTO 0) ;
BEGIN
Sum <= ('0' & X) + Y + Cin ;
S <= Sum(15 DOWNTO 0) ;
Cout <= Sum(16) ;
END Behavior ;
.. .

104

Addition of Unsigned Numbers


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_arith.all ;
ENTITY adder16 IS
PORT ( Cin
X, Y
S
Cout
END adder16 ;

: IN
: IN
: OUT
: OUT

STD_LOGIC ;
UNSIGNED(15 DOWNTO 0) ;
UNSIGNED(15 DOWNTO 0) ;
STD_LOGIC ) ;

ARCHITECTURE Behavior OF adder16 IS


SIGNAL Sum : UNSIGNED(16 DOWNTO 0) ;
BEGIN
Sum <= ('0' & X) + Y + Cin ;
S <= Sum(15 DOWNTO 0) ;
Cout <= Sum(16) ;
END Behavior ;
.. .

105

Addition of Signed Numbers (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_signed.all ;
ENTITY adder16 IS
PORT ( Cin
X, Y
S
Cout, Overflow
END adder16 ;

: IN
: IN
: OUT
: OUT

STD_LOGIC ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC ) ;

ARCHITECTURE Behavior OF adder16 IS


SIGNAL Sum : STD_LOGIC_VECTOR(16 DOWNTO 0) ;
BEGIN
Sum <= ('0' & X) + Y + Cin ;
S <= Sum(15 DOWNTO 0) ;
Cout <= Sum(16) ;
Overflow <= Sum(16) XOR X(15) XOR Y(15) XOR Sum(15) ;
END Behavior ;
.. .

106

.. .

107

Addition of Signed Numbers (2)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_arith.all ;
ENTITY adder16 IS
PORT ( Cin
X, Y
S
Cout, Overflow
END adder16 ;

: IN
: IN
: OUT
: OUT

STD_LOGIC ;
SIGNED(15 DOWNTO 0) ;
SIGNED(15 DOWNTO 0) ;
STD_LOGIC ) ;

ARCHITECTURE Behavior OF adder16 IS


SIGNAL Sum : SIGNED(16 DOWNTO 0) ;
BEGIN
Sum <= ('0' & X) + Y + Cin ;
S <= Sum(15 DOWNTO 0) ;
Cout <= Sum(16) ;
Overflow <= Sum(16) XOR X(15) XOR Y(15) XOR Sum(15) ;
END Behavior ;
.. .

108

Multiplication of signed and unsigned


numbers (1)
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all ;
entity multiply is
port(
a : in STD_LOGIC_VECTOR(15 downto 0);
b : in STD_LOGIC_VECTOR(7 downto 0);
cu : out STD_LOGIC_VECTOR(23 downto 0);
cs : out STD_LOGIC_VECTOR(23 downto 0)
);
end multiply;
architecture dataflow of multiply is
SIGNAL sa: SIGNED(15 downto 0);
SIGNAL sb: SIGNED(7 downto 0);
SIGNAL sres: SIGNED(23 downto 0);
SIGNAL ua: UNSIGNED(15 downto 0);
SIGNAL ub: UNSIGNED(7 downto 0);
SIGNAL ures: UNSIGNED(23 downto 0);

.. .

109

Multiplication of signed and unsigned


numbers (2)
begin
-- signed multiplication
sa <= SIGNED(a);
sb <= SIGNED(b);
sres <= sa * sb;
cs <= STD_LOGIC_VECTOR(sres);
-- unsigned multiplication
ua <= UNSIGNED(a);
ub <= UNSIGNED(b);
ures <= ua * ub;
cu <= STD_LOGIC_VECTOR(ures);
end dataflow;

.. .

110

Integer Types
Operations on signals (variables)
of the integer types:
INTEGER, NATURAL,
and their sybtypes, such as
TYPE day_of_month IS RANGE 0 TO 31;
are synthesizable in the range
-(231-1) .. 231 -1 for INTEGERs and their subtypes
0 .. 231 -1 for NATURALs and their subtypes

.. .

111

Integer Types
Operations on signals (variables)
of the integer types:
INTEGER, NATURAL,
are less flexible and more difficult to control
than operations on signals (variables) of the type
STD_LOGIC_VECTOR
UNSIGNED
SIGNED, and thus
are recommened to be avoided by beginners.
.. .

112

Addition of Signed Integers

ENTITY adder16 IS
PORT ( X, Y
S
END adder16 ;

: IN
: OUT

INTEGER RANGE -32767 TO 32767 ;


INTEGER RANGE -32767 TO 32767 ) ;

ARCHITECTURE Behavior OF adder16 IS


BEGIN
S <= X + Y ;
END Behavior ;

.. .

113

VHDL Design Styles


VHDL Design
Styles

dataflow
Concurrent
statements

structural
Components and
interconnects

behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.

.. .

114

Structural VHDL
Major instructions
component instantiation (port map)
generate scheme for component instantiations
(for-generate)
component instantiation with generic
(generic map, port map)

.. .

115

Structural VHDL
Major instructions
component instantiation
(port map)
component instantiation with generic
(generic map, port map)
generate scheme for component instantiations
(for-generate)

.. .

116

Circuit built of medium scale components


s(0)

r(0)

r(1)

En

p(0)

w0

p(1)
r(2)
p(2)

r(3)
r(4)
r(5)

w1

p(3)

q(0)
q(1)

y1

w2
w3

y0

z
priority

ena

w
0
w
1

En

Enable
z(0)
z(0)

y
0
y
1
y
2
y
3

z(1)

z(3)
Clk

z(1)
z(2)

z(2)

dec2to4

D Q

regn

z(3)

Clock

s(1)
.. .

117

2-to-1 Multiplexer

w
0

w
1

(a) Graphical symbol

w
0

w
1

(b) Truth table

.. .

118

VHDL code for a 2-to-1 Multiplexer


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY mux2to1 IS
PORT ( w0, w1, s
f
END mux2to1 ;

: IN
: OUT

STD_LOGIC ;
STD_LOGIC ) ;

ARCHITECTURE dataflow OF mux2to1 IS


BEGIN
f <= w0 WHEN s = '0' ELSE w1 ;
END dataflow ;

.. .

119

.. .

120

Priority Encoder
w0

y0

w1

y1

w2

w3

w3 w2 w1 w0
0
0
0
0
1

0
0
0
1
x

0
0
1
x
x

0
1
x
x
x

y1 y0

d
0
0
1
1

0
1
1
1
1

d
0
1
0
1

VHDL code for a Priority Encoder


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY priority IS
PORT ( w : IN
y : OUT
z : OUT
END priority ;

STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;

ARCHITECTURE dataflow OF priority IS


BEGIN
y <= "11" WHEN w(3) = '1' ELSE
"10" WHEN w(2) = '1' ELSE
"01" WHEN w(1) = '1' ELSE
"00" ;
z <= '0' WHEN w = "0000" ELSE '1' ;
END dataflow ;
.. .

121

2-to-4 Decoder

En w w
1 0

y y y y
0 1 2 3

(a) Truth table

w
0
w
1

En

y
0
y
1
y
2
y
3

(b) Graphical symbol

.. .

122

VHDL code for a 2-to-4 Decoder


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY dec2to4 IS
PORT ( w : IN
En : IN
y
: OUT
END dec2to4 ;

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;

ARCHITECTURE dataflow OF dec2to4 IS


SIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) ;
BEGIN
Enw <= En & w ;
WITH Enw SELECT
y <= 0001" WHEN "100",
"0010" WHEN "101",
"0100" WHEN "110",
1000" WHEN "111",
"0000" WHEN OTHERS ;
END dataflow ;
123

.. .

N-bit register with enable


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY regn IS
GENERIC ( N : INTEGER := 8 ) ;
PORT ( D
: IN
Enable, Clock : IN
Q
: OUT
END regn ;

STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;

ARCHITECTURE Behavior OF regn IS


BEGIN
PROCESS (Clock)
BEGIN
IF (Clock'EVENT AND Clock = '1' ) THEN
IF Enable = '1' THEN
Q <= D ;
END IF ;
END IF;
END PROCESS ;
END Behavior ;

Enable
Q

Clock

regn

.. .

124

Circuit built of medium scale components


s(0)

r(0)

r(1)

1
p(1)

r(2)
p(2)

r(3)
r(4)
r(5)

En

p(0)

w0
w1

p(3)

q(0)
q(1)

y1

w2
w3

y0

ena

priority

w
0
w
1

En

Enable
t(0)
z(0)

y
0
y
1
y
2
y
3

z(1)

D Q

t(2)

z(2)
z(3)

dec2to4

Clk

t(1)

regn

t(3)

Clock

s(1)
.. .

125

Structural description example (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY priority_resolver IS
PORT (r
: IN
STD_LOGIC_VECTOR(5 DOWNTO 0) ;
s
: IN
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
clk
: IN
STD_LOGIC;
en
: IN
STD_LOGIC;
t
: OUT STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
END priority_resolver;
ARCHITECTURE structural OF priority_resolver IS
SIGNAL
SIGNAL
SIGNAL
SIGNAL

p : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
q : STD_LOGIC_VECTOR (1 DOWNTO 0) ;
z : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
ena : STD_LOGIC ;

.. .

126

Structural description example (2)


COMPONENT mux2to1
PORT (w0, w1, s
f
END COMPONENT ;

: IN
: OUT

COMPONENT priority
PORT (w
: IN
y
: OUT
z
: OUT
END COMPONENT ;

STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;

COMPONENT dec2to4
PORT (w
: IN
En
: IN
y
: OUT
END COMPONENT ;

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;

STD_LOGIC ;
STD_LOGIC ) ;

.. .

127

Structural description example (3)


COMPONENT regn
GENERIC ( N : INTEGER := 8 ) ;
PORT (
D : IN
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
Enable, Clock : IN
STD_LOGIC ;
Q : OUT
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;
END COMPONENT ;

.. .

128

Structural description example (4)


BEGIN
u1: mux2to1 PORT MAP (w0 => r(0) ,
w1 => r(1),
s => s(0),
f => p(0));
p(1) <= r(2);
p(1) <= r(3);
u2: mux2to1 PORT MAP (w0 => r(4) ,
w1 => r(5),
s => s(1),
f => p(3));
u3: priority PORT MAP (w => p,
y => q,
z => ena);
u4: dec2to4 PORT MAP (w => q,
En => ena,
y => z);
.. .

129

Structural description example (5)


u5: regn

GENERIC MAP (N => 4)


PORT MAP (D => z ,
Enable => En ,
Clock => Clk,
Q => t );

END structural;

.. .

130

Named association connectivity


recommended in majority of cases,
prevents ommisions and mistakes
COMPONENT dec2to4
PORT (w
: IN
En : IN
y
: OUT
END COMPONENT ;

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;

u4: dec2to4 PORT MAP (w => q,


En => ena,
y => z);

.. .

131

Positional association connectivity


allowed, especially for the cases of
small number of ports
multiple instantiations of the same component,
in regular structures
COMPONENT dec2to4
PORT (w
: IN
En : IN
y
: OUT
END COMPONENT ;

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;

u4: dec2to4 PORT MAP (w, En, y);

.. .

132

Structural description with


positional association connectivity
BEGIN
u1: mux2to1 PORT MAP (r(0), r(1), s(0), p(0));
p(1) <= r(2);
p(1) <= r(3);
u2: mux2to1 PORT MAP (r(4) , r(5), s(1), p(3));
u3: priority PORT MAP (p, q, ena);
u4: dec2to4 PORT MAP (q, ena, z);
u5: regn GENERIC MAP(4) PORT MAP (z, En, Clk, t);
END structural;

.. .

133

Package example (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
PACKAGE GatesPkg IS
COMPONENT mux2to1
PORT (w0, w1, s : IN
f
: OUT
END COMPONENT ;

STD_LOGIC ;
STD_LOGIC ) ;

COMPONENT priority
PORT (w : IN
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
y
: OUT STD_LOGIC_VECTOR(1 DOWNTO 0) ;
z
: OUT STD_LOGIC ) ;
END COMPONENT ;

.. .

134

Package example (2)


COMPONENT dec2to4
PORT (w : IN
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
En
: IN
STD_LOGIC ;
y
: OUT STD_LOGIC_VECTOR(0 TO 3) ) ;
END COMPONENT ;
COMPONENT regn
GENERIC ( N : INTEGER := 8 ) ;
PORT ( D : IN
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
Enable, Clock
: IN
STD_LOGIC ;
Q : OUT
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;
END COMPONENT ;

.. .

135

Package example (3)


constant ADDAB : std_logic_vector(3 downto 0) := "0000";
constant ADDAM : std_logic_vector(3 downto 0) := "0001";
constant SUBAB : std_logic_vector(3 downto 0) := "0010";
constant SUBAM : std_logic_vector(3 downto 0) := "0011";
constant NOTA : std_logic_vector(3 downto 0) := "0100";
constant NOTB : std_logic_vector(3 downto 0) := "0101";
constant NOTM : std_logic_vector(3 downto 0) := "0110";
constant ANDAB : std_logic_vector(3 downto 0) := "0111";
END GatesPkg;

.. .

136

Package usage (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE work.GatesPkg.all;

ENTITY priority_resolver IS
PORT (r
: IN
STD_LOGIC_VECTOR(5 DOWNTO 0) ;
s
: IN
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
clk
: IN
STD_LOGIC;
en
: IN
STD_LOGIC;
t
: OUT
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
END priority_resolver;
ARCHITECTURE structural OF priority_resolver IS
SIGNAL
SIGNAL
SIGNAL
SIGNAL

p : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
q : STD_LOGIC_VECTOR (1 DOWNTO 0) ;
z : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
ena : STD_LOGIC ;

.. .

137

.. .

138

Package usage (2)


BEGIN
u1: mux2to1 PORT MAP (w0 => r(0) ,
w1 => r(1),
s => s(0),
f => p(0));
p(1) <= r(2);
p(1) <= r(3);
u2: mux2to1 PORT MAP (w0 => r(4) ,
w1 => r(5),
s => s(1),
f => p(3));
u3: priority PORT MAP (w => p,
y => q,
z => ena);
u4: dec2to4 PORT MAP (w => q,
En => ena,
y => z);

Package usage (3)


u5: regn

GENERIC MAP (N => 4)


PORT MAP (D => z ,
Enable => En ,
Clock => Clk,
Q => t );

END structural;

.. .

139

Configuration declaration
CONFIGURATION SimpleCfg OF priority_resolver IS
FOR structural
FOR ALL: mux2to1
USE ENTITY work.mux2to1(dataflow);
END FOR;
FOR u3: priority
USE ENTITY work.priority(dataflow);
END FOR;
FOR u4: dec2to4
USE ENTITY work.dec2to4(dataflow);
END FOR;
END FOR;
END SimpleCfg;

.. .

140

Configuration specification
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE work.GatesPkg.all;
ENTITY priority_resolver IS
PORT (r
: IN
s
: IN
z
: OUT
END priority_resolver;

STD_LOGIC_VECTOR(5 DOWNTO 0) ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;

ARCHITECTURE structural OF priority_resolver IS


SIGNAL p : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
SIGNAL q : STD_LOGIC_VECTOR (1 DOWNTO 0) ;
SIGNAL ena : STD_LOGIC ;

FOR ALL: mux2to1 USE ENTITY work.mux2to1(dataflow);


FOR u3: priority USE ENTITY work.priority(dataflow);
FOR u4: dec2to4 USE ENTITY work.dec2to4(dataflow);

.. .

141

Structural VHDL
Major instructions
component instantiation (port map)
component instantiation with generic
(generic map, port map)
generate scheme for component instantiations
(for-generate)

.. .

142

Example 1
s0
s1
w0
w3

s2
s3

w4
w7

f
w8
w11

w12
w15
.. .

143

A 4-to-1 Multiplexer
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY mux4to1 IS
PORT (
w0, w1, w2, w3
s
: IN
f
: OUT
END mux4to1 ;

: IN
STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;

ARCHITECTURE Dataflow OF mux4to1 IS


BEGIN
WITH s SELECT
f <= w0 WHEN "00",
w1 WHEN "01",
w2 WHEN "10",
w3 WHEN OTHERS ;
END Dataflow ;

.. .

144

Straightforward code for Example 1


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY Example1 IS
PORT ( w
: IN
s
: IN
f
: OUT
END Example1 ;

STD_LOGIC_VECTOR(0 TO 15) ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC ) ;

.. .

145

Straightforward code for Example 1


ARCHITECTURE Structure OF Example1 IS
COMPONENT mux4to1
PORT ( w0, w1, w2, w3
s
f
END COMPONENT ;

: IN
: IN
: OUT

STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;

SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGIN
Mux1: mux4to1 PORT MAP ( w(0),
Mux2: mux4to1 PORT MAP ( w(4),
Mux3: mux4to1 PORT MAP ( w(8),
Mux4: mux4to1 PORT MAP ( w(12),
Mux5: mux4to1 PORT MAP ( m(0),
END Structure ;

w(1),
w(5),
w(9),
w(13),
m(1),

w(2),
w(6),
w(10),
w(14),
m(2),

w(3),
w(7),
w(11),
w(15),
m(3),

s(1 DOWNTO 0), m(0) ) ;


s(1 DOWNTO 0), m(1) ) ;
s(1 DOWNTO 0), m(2) ) ;
s(1 DOWNTO 0), m(3) ) ;
s(3 DOWNTO 2), f ) ;

.. .

146

Modified code for Example 1


ARCHITECTURE Structure OF Example1 IS
COMPONENT mux4to1
PORT ( w0, w1, w2, w3
s
f
END COMPONENT ;

: IN
: IN
: OUT

STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;

SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGIN
G1: FOR i IN 0 TO 3 GENERATE
Muxes: mux4to1 PORT MAP (
w(4*i), w(4*i+1), w(4*i+2), w(4*i+3), s(1 DOWNTO 0), m(i) ) ;
END GENERATE ;
Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ;
END Structure ;

.. .

147

.. .

148

Example 2
w0
w1

w0
w1
En
w0
w1

w2
w3

w0
w1

En

En

y0
y1
y2
y3

En
w0
w1
En
w0
w1
En

y0
y1
y2
y3

y0
y1
y2
y3

y0
y1
y2
y3

y4
y5
y6
y7

y0
y1
y2
y3

y8
y9
y10
y11

y0
y1
y2
y3

y12
y13
y14
y15

A 2-to-4 binary decoder


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY dec2to4 IS
PORT ( w
: IN
En
: IN
y
: OUT
END dec2to4 ;

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;

ARCHITECTURE Dataflow OF dec2to4 IS


SIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) ;
BEGIN
Enw <= En & w ;
WITH Enw SELECT
y <= "1000" WHEN "100",
"0100" WHEN "101",
"0010" WHEN "110",
"0001" WHEN "111",
"0000" WHEN OTHERS ;
END Dataflow ;
.. .

149

VHDL code for Example 2 (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY dec4to16 IS
PORT (w
: IN
En
: IN
y
: OUT
END dec4to16 ;

STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 15) ) ;

.. .

150

VHDL code for Example 2 (2)


ARCHITECTURE Structure OF dec4to16 IS
COMPONENT dec2to4
PORT ( w
En
y
END COMPONENT ;

: IN
: IN
: OUT

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;

SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGIN
G1: FOR i IN 0 TO 3 GENERATE
Dec_ri: dec2to4 PORT MAP ( w(1 DOWNTO 0), m(i), y(4*i TO 4*i+3) );
G2: IF i=3 GENERATE
Dec_left: dec2to4 PORT MAP ( w(i DOWNTO i-1), En, m ) ;
END GENERATE ;
END GENERATE ;
END Structure ;

.. .

151

Mixed Style Modeling


architecture ARCHITECTURE_NAME of ENTITY_NAME is

Here you can declare signals, constants, functions,


procedures
Component declarations
No variable declarations !!

begin
Concurrent statements:
Concurrent simple signal assignment
Conditional signal assignment
Selected signal assignment
Generate statement

Concurrent Statements

Component instantiation statement


Process statement
inside process you can use only sequential
statements

end ARCHITECTURE_NAME;
.. .

152

VHDL Design Styles


VHDL Design
Styles

dataflow
Concurrent
statements

structural
Components and
interconnects

behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.

.. .

153

Anatomy of a Process
OPTIONAL

[label:] process [(sensitivity list)]


[declaration part]
begin
statement part
end process [label];

.. .

154

Statement Part
Contains Sequential Statements to be
Executed Each Time the Process Is
Activated
Analogous to Conventional Programming
Languages

.. .

155

What is a PROCESS?
A process is a sequence of instructions referred to as
sequential statements.
The keyword PROCESS
A process can be given a unique name
using an optional LABEL
This is followed by the keyword
PROCESS
The keyword BEGIN is used to indicate
the start of the process
All statements within the process are
executed SEQUENTIALLY. Hence,
order of statements is important.

Testing: PROCESS
BEGIN
test_vector<=00;
WAIT FOR 10 ns;
test_vector<=01;
WAIT FOR 10 ns;
test_vector<=10;
WAIT FOR 10 ns;
test_vector<=11;
WAIT FOR 10 ns;
END PROCESS;

A process must end with the keywords


END PROCESS.

.. .

156

Execution of statements in a PROCESS

The execution of statements


continues sequentially till the
last statement in the process.
After execution of the last
statement, the control is again
passed to the beginning of the
process.

Order of execution

Testing: PROCESS
BEGIN
test_vector<=00;
WAIT FOR 10 ns;
test_vector<=01;
WAIT FOR 10 ns;
test_vector<=10;
WAIT FOR 10 ns;
test_vector<=11;
WAIT FOR 10 ns;
END PROCESS;
Program control is passed to the
first statement after BEGIN

.. .

157

PROCESS with a WAIT Statement

The last statement in the


PROCESS is a WAIT instead of
WAIT FOR 10 ns.
This will cause the PROCESS
to suspend indefinitely when
the WAIT statement is
executed.
This form of WAIT can be used
in a process included in a
testbench when all possible
combinations of inputs have
been tested or a non-periodical
signal has to be generated.

Testing: PROCESS
BEGIN
test_vector<=00;
WAIT FOR 10 ns;
test_vector<=01;
WAIT FOR 10 ns;
test_vector<=10;
WAIT FOR 10 ns;
test_vector<=11;
WAIT;
END PROCESS;
Order of execution

Program execution stops here

.. .

158

WAIT FOR vs. WAIT


WAIT FOR: waveform will keep repeating
itself forever
0

WAIT : waveform will keep its state after


the last wait instruction.

.. .

159

Sequential Statements (1)


If Statement
if boolean expression then
statements
elsif boolean expression then
statements
else boolean expression then
statements
end if;

else and elsif are optional


.. .

160

If Statement - Example
SELECTOR: process
begin
WAIT UNTIL Clock'EVENT AND Clock = '1' ;
IF Sel = 00 THEN
f <= x1;
ELSIF Sel = 10 THEN
f <= x2;
ELSE
f <= x3;
END IF;
end process;

.. .

161

Loop Statement
Loop Statement
FOR i IN range LOOP
statements
END LOOP;

Repeats a Section of VHDL Code


Example: process every element in an array in
the same way

.. .

162

Loop Statement Example (1)

Testing: PROCESS
BEGIN
test_vector<="000";
FOR i IN 0 TO 7 LOOP
WAIT FOR 10 ns;
test_vector<=test_vector+001";
END LOOP;
END PROCESS;

.. .

163

Loop Statement Example (2)


Testing: PROCESS
BEGIN
test_ab<="00";
test_sel<="00";
FOR i IN 0 TO 3 LOOP
FOR j IN 0 TO 3 LOOP
WAIT FOR 10 ns;
test_ab<=test_ab+"01";
END LOOP;
test_sel<=test_sel+"01";
END LOOP;
END PROCESS;

.. .

164

PROCESS with a SENSITIVITY LIST


List of signals to which the
process is sensitive.
Whenever there is an
event on any of the
signals in the sensitivity
list, the process fires.
Every time the process
fires, it will run in its
entirety.
WAIT statements are
NOT ALLOWED in a
processes with
SENSITIVITY LIST.

label: process (sensitivity list)


declaration part
begin
statement part
end process;

.. .

165

Generating selected values of one input


SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0);
BEGIN
.......
testing: PROCESS
BEGIN
test_vector <= "000";
WAIT FOR 10 ns;
test_vector <= "001";
WAIT FOR 10 ns;
test_vector <= "010";
WAIT FOR 10 ns;
test_vector <= "011";
WAIT FOR 10 ns;
test_vector <= "100";
WAIT FOR 10 ns;
END PROCESS;

........
END behavioral;
.. .

166

Generating all values of one input


SIGNAL test_vector : STD_LOGIC_VECTOR(3 downto 0):="0000";
BEGIN
.......
testing: PROCESS
BEGIN
WAIT FOR 10 ns;
test_vector <= test_vector + 1;
end process TESTING;
........
END behavioral;

.. .

167

Generating all possible values of two inputs


SIGNAL test_ab : STD_LOGIC_VECTOR(1 downto 0);
SIGNAL test_sel : STD_LOGIC_VECTOR(1 downto 0);
BEGIN
.......
double_loop: PROCESS
BEGIN
test_ab <="00";
test_sel <="00";
for I in 0 to 3 loop
for J in 0 to 3 loop
wait for 10 ns;
test_ab <= test_ab + 1;
end loop;
test_sel <= test_sel + 1;
end loop;
END PROCESS;

........
END behavioral;

.. .

168

Generating periodical signals, such as clocks


CONSTANT clk1_period : TIME := 20 ns;
CONSTANT clk2_period : TIME := 200 ns;
SIGNAL clk1 : STD_LOGIC;
SIGNAL clk2 : STD_LOGIC := 0;
BEGIN
.......
clk1_generator: PROCESS
clk1 <= 0;
WAIT FOR clk1_period/2;
clk1 <= 1;
WAIT FOR clk1_period/2;
END PROCESS;
clk2 <= not clk2 after clk2_period/2;
.......
END behavioral;

.. .

169

Generating one-time signals, such as resets


CONSTANT reset1_width : TIME := 100 ns;
CONSTANT reset2_width : TIME := 150 ns;
SIGNAL reset1 : STD_LOGIC;
SIGNAL reset2 : STD_LOGIC := 1;
BEGIN
.......
reset1_generator: PROCESS
reset1 <= 1;
WAIT FOR reset_width;
reset1 <= 0;
WAIT;
END PROCESS;
reset2_generator: PROCESS
WAIT FOR reset_width;
reset2 <= 0;
WAIT;
END PROCESS;
.......
END behavioral;
.. .

170

Typical error
SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0);
SIGNAL reset : STD_LOGIC;
BEGIN
.......
generator1: PROCESS
reset <= 1;
WAIT FOR 100 ns
reset <= 0;
test_vector <="000";
WAIT;
END PROCESS;
generator2: PROCESS
WAIT FOR 200 ns
test_vector <="001";
WAIT FOR 600 ns
test_vector <="011";
END PROCESS;
.......
END behavioral;
.. .

171

Register Transfer Level (RTL) Design Description

Combinational
Logic

Combinational
Logic

Registers

.. .

172

Component Equivalent of a Process


priority: PROCESS (clk)
BEGIN
IF w(3) = '1' THEN
y <= "11" ;
ELSIF w(2) = '1' THEN
y <= "10" ;
ELSIF w(1) = c THEN
y <= a and b;
ELSE
z <= "00" ;
END IF ;
END PROCESS ;

clk
w
a
b
c

y
priority

All signals which appear on the


left of signal assignment
statement (<=) are outputs e.g.
y, z
All signals which appear on the
right of signal assignment
statement (<=) or in logic
expressions are inputs e.g. w, a,
b, c
All signals which appear in the
sensitivity list are inputs e.g. clk
Note that not all inputs need to
be included in the sensitivity list
.. .

173

Processes in VHDL
Processes Describe Sequential Behavior
Processes in VHDL Are Very Powerful
Statements
Allow to define an arbitrary behavior that may
be difficult to represent by a real circuit
Not every process can be synthesized

Use Processes with Caution in the Code to


Be Synthesized
Use Processes Freely in Testbenches and
algorithm specifications
.. .

174

D latch
Truth table

Graphical symbol

Clock
0
1
1

D
Clock

0
1

Q(t+1)
Q(t)
0
1

Timing diagram
t1

t2

t3

t4

Clock
D
Q
Time

175

.. .

D flip-flop
Truth table

Graphical symbol
D

Clk D
n 0
n 1
0
1

Clock

Q(t+1)
0
1
Q(t)
Q(t)

Timing diagram
t1

t2

t3

t4

Clock
D
Q
Time

.. .

176

D latch
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY latch IS
PORT ( D, Clock : IN
Q
: OUT
END latch ;

STD_LOGIC ;
STD_LOGIC) ;

Clock

ARCHITECTURE Behavior OF latch IS


BEGIN
PROCESS ( D, Clock )
BEGIN
IF Clock = '1' THEN
Q <= D ;
END IF ;
END PROCESS ;
END Behavior;

177

.. .

D flip-flop (1)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;

Clock

ARCHITECTURE Behavior_1 OF flipflop IS


BEGIN
PROCESS ( Clock )
BEGIN
IF Clock'EVENT AND Clock = '1' THEN
Q <= D ;
END IF ;
END PROCESS ;
END Behavior_1 ;

.. .

178

D flip-flop (2)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;

Clock

ARCHITECTURE Behavior_1 OF flipflop IS


BEGIN
PROCESS ( Clock )
BEGIN
IF rising_edge(Clock) THEN
Q <= D ;
END IF ;
END PROCESS ;
END Behavior_1 ;

179

.. .

D flip-flop (3)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;

Clock

ARCHITECTURE Behavior_2 OF flipflop IS


BEGIN
PROCESS
BEGIN
WAIT UNTIL Clock'EVENT AND Clock = '1' ;
Q <= D ;
END PROCESS ;
END Behavior_2 ;

.. .

180

D flip-flop (4)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;

Clock

ARCHITECTURE Behavior_2 OF flipflop IS


BEGIN
PROCESS
BEGIN
WAIT UNTIL rising_edge(Clock) ;
Q <= D ;
END PROCESS ;
END Behavior_2 ;

181

.. .

D flip-flop with asynchronous reset


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Resetn, Clock
Q
END flipflop ;

: IN
: OUT

STD_LOGIC ;
STD_LOGIC) ;

Clock
Resetn

ARCHITECTURE Behavior OF flipflop IS


BEGIN
PROCESS ( Resetn, Clock )
BEGIN
IF Resetn = '0' THEN
Q <= '0' ;
ELSIF Clock'EVENT AND Clock = '1' THEN
Q <= D ;
END IF ;
END PROCESS ;
END Behavior ;
.. .

182

D flip-flop with synchronous reset


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Resetn, Clock
Q
END flipflop ;

: IN
: OUT

STD_LOGIC ;
STD_LOGIC) ;

Clock
Resetn

ARCHITECTURE Behavior OF flipflop IS


BEGIN
PROCESS
BEGIN
WAIT UNTIL Clock'EVENT AND Clock = '1' ;
IF Resetn = '0' THEN
Q <= '0' ;
ELSE
Q <= D ;
END IF ;
END PROCESS ;
END Behavior ;
183

.. .

8-bit register with asynchronous reset


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY reg8 IS
PORT ( D
Resetn, Clock
Q
END reg8 ;

: IN
STD_LOGIC_VECTOR(7 DOWNTO 0) ;
: IN
STD_LOGIC ;
: OUT STD_LOGIC_VECTOR(7 DOWNTO 0) ) ;

ARCHITECTURE Behavior OF reg8 IS


BEGIN
PROCESS ( Resetn, Clock )
BEGIN
IF Resetn = '0' THEN
Q <= "00000000" ;
ELSIF Clock'EVENT AND Clock = '1' THEN
Q <= D ;
END IF ;
END PROCESS ;
END Behavior ;`

Resetn
D

Clock

reg8

.. .

184

N-bit register with asynchronous reset


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY regn IS
GENERIC ( N : INTEGER := 16 ) ;
PORT ( D
: IN
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
Resetn, Clock : IN
STD_LOGIC ;
Q
: OUT
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;
END regn ;
ARCHITECTURE Behavior OF regn IS
BEGIN
PROCESS ( Resetn, Clock )
BEGIN
IF Resetn = '0' THEN
Q <= (OTHERS => '0') ;
ELSIF Clock'EVENT AND Clock = '1' THEN
Q <= D ;
END IF ;
END PROCESS ;
END Behavior ;

Resetn
D

Clock

regn

185

.. .

N-bit register with enable


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY regn IS
GENERIC ( N : INTEGER := 8 ) ;
PORT ( D
: IN
Enable, Clock : IN
Q
: OUT
END regn ;

STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;

ARCHITECTURE Behavior OF regn IS


BEGIN
PROCESS (Clock)
BEGIN
IF (Clock'EVENT AND Clock = '1' ) THEN
IF Enable = '1' THEN
Q <= D ;
END IF ;
END IF;
END PROCESS ;
END Behavior ;

Enable
Q

Clock

regn

.. .

186

2-bit up-counter with synchronous reset


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_unsigned.all ;
ENTITY upcount IS
PORT ( Clear, Clock
: IN
Q
: BUFFER
END upcount ;

STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ) ;

ARCHITECTURE Behavior OF upcount IS


BEGIN
upcount: PROCESS ( Clock )
BEGIN
IF (Clock'EVENT AND Clock = '1') THEN
IF Clear = '1' THEN
Q <= "00" ;
ELSE
Q <= Q + 01 ;
END IF ;
END IF;
END PROCESS;
END Behavior ;

Clear

2
Q

upcount

Clock

.. .

187

4-bit up-counter with asynchronous reset (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_unsigned.all ;
ENTITY upcount IS
PORT ( Clock, Resetn, Enable : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC_VECTOR (3 DOWNTO 0)) ;
END upcount ;

Enable

Q
Clock

upcount

Resetn

.. .

188

4-bit up-counter with asynchronous reset (2)


ARCHITECTURE Behavior OF upcount IS
SIGNAL Count : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
BEGIN
PROCESS ( Clock, Resetn )
BEGIN
IF Resetn = '0' THEN
Count <= "0000" ;
ELSIF (Clock'EVENT AND Clock = '1') THEN
IF Enable = '1' THEN
Count <= Count + 1 ;
END IF ;
Enable
END IF ;
Q
END PROCESS ;
Q <= Count ;
Clock
END Behavior ;

upcount

Resetn

.. .

189

Shift register

Sin

Q(1)

Q(2)

Q(3)

Q(0)

Clock

Enable

.. .

190

Shift Register With Parallel Load


Load
D(3)

D(1)

D(2)

Sin
D

D(0)

Clock

Enable
Q(3)

Q(2)

Q(1)

Q(0)
.. .

191

4-bit shift register with parallel load (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY shift4 IS
PORT ( D
Enable
Load
Sin
Clock
Q
END shift4 ;

: IN
: IN
: IN
: IN
: IN
: BUFFER

STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;

Enable
D
Q

Load
Sin

shift4

Clock
.. .

192

4-bit shift register with parallel load (2)


ARCHITECTURE Behavior_1 OF shift4 IS
BEGIN
PROCESS (Clock)
BEGIN
IF Clock'EVENT AND Clock = '1' THEN
IF Load = '1' THEN
Q <= D ;
ELSIF Enable = 1 THEN
Q(0) <= Q(1) ;
Q(1) <= Q(2);
Q(2) <= Q(3) ;
4
Q(3) <= Sin;
END IF ;
END IF ;
END PROCESS ;
END Behavior_1 ;

Enable
D
Q

Load
Sin

shift4

Clock

.. .

193

N-bit shift register with parallel load (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY shiftn IS
GENERIC ( N : INTEGER := 8 ) ;
PORT ( D : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
Enable : IN
STD_LOGIC ;
Load
: IN
STD_LOGIC ;
Sin
: IN
STD_LOGIC ;
Clock
: IN
STD_LOGIC ;
Q
: BUFFER STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;
END shiftn ;
N

Enable
D
Q

Load
Sin

shiftn

Clock
.. .

194

N-bit shift register with parallel load (2)


ARCHITECTURE Behavior OF shiftn IS
BEGIN
PROCESS (Clock)
BEGIN
IF (Clock'EVENT AND Clock = '1' ) THEN
IF Load = '1' THEN
Q <= D ;
ELSIF Enable = 1 THEN
Genbits: FOR i IN 0 TO N-2 LOOP
Q(i) <= Q(i+1) ;
END LOOP ;
Q(N-1) <= Sin ;
N
Enable
END IF;
D
Q
END IF ;
END PROCESS ;
Load
END Behavior ;
Sin

shiftn

Clock

.. .

195

Variable Example (1)


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY Numbits IS
PORT ( X
Count
END Numbits ;

: IN
STD_LOGIC_VECTOR(1 TO 3) ;
: OUT INTEGER RANGE 0 TO 3) ;

.. .

196

Variable Example (2)


ARCHITECTURE Behavior OF Numbits IS
BEGIN
PROCESS(X) count the number of bits in X equal to 1
VARIABLE Tmp: INTEGER;
BEGIN
Tmp := 0;
FOR i IN 1 TO 3 LOOP
IF X(i) = 1 THEN
Tmp := Tmp + 1;
END IF;
END LOOP;
Count <= Tmp;
END PROCESS;
END Behavior ;

.. .

197

Variables - features
Can only be declared within processes and
subprograms (functions & procedures)
Initial value can be explicitly specified in the
declaration
When assigned take an assigned value
immediately
Variable assignments represent the desired
behavior, not the structure of the circuit
Should be avoided, or at least used with
caution in a synthesizable code
.. .

198

Delays
Delays are not synthesizable
Statements, such as
wait for 5 ns
a <= b after 10 ns
will not produce the required delay, and
should not be used in the code intended
for synthesis.

.. .

199

Initializations
Declarations of signals (and variables)
with initialized values, such as
SIGNAL a : STD_LOGIC := 0;
cannot be synthesized, and thus should
be avoided.
If present, they will be ignored by the
synthesis tools.
Use set and reset signals instead.

.. .

200

Reports and asserts


Reports and asserts, such as
report "Initialization complete";
assert initial_value <= max_value
report "initial value too large"
severity error;
cannot be synthesized, but they
can be freely used in the code intended for
synthesis.
They will be used during simulation and
ignored during synthesis.
.. .

201

Floating-point operations
Operations on signals (and variables)
of the type
real
are not synthesizable by the
current generation of synthesis tools.

.. .

202

Records Examples (1)


type opcodes is (add, sub, and, or);
type reg_number is range 0 to 8;
type instruction is record
opcode: opcodes;
source_reg1: reg_number;
source_reg2: reg_number;
dest_reg: reg_number;
displacement: integer;
end record instruction
.. .

203

Records Examples (2)


type word is record
instr: instruction;
data: bit_vector(31 downto 0);
end record instruction;
constant add_instr_1_3: instruction:=
(opcode => add,
source_reg1 | dest_reg => 1,
source_reg2 => 3,
displacement => 0);
.. .

204

2-to-4 Decoder

En w w
1 0

y y y y
0 1 2 3

(a) Truth table

w
0
w
1

En

y
0
y
1
y
2
y
3

(b) Graphical symbol

.. .

205

Describing combinational logic using processes


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY dec2to4 IS
PORT ( w
: IN
En : IN
y
: OUT
END dec2to4 ;

STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;

ARCHITECTURE Behavior OF dec2to4 IS


BEGIN
PROCESS ( w, En )
BEGIN
IF En = '1' THEN
CASE w IS
WHEN "00" =>
WHEN "01" =>
WHEN "10" =>
WHEN OTHERS =>
END CASE ;
ELSE
y <= "0000" ;
END IF ;
END PROCESS ;
END Behavior ;

y <= "1000" ;
y <= "0100" ;
y <= "0010" ;
y <= "0001" ;

.. .

206

Describing combinational logic using processes


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY seg7 IS
PORT ( bcd : IN
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
leds : OUT
STD_LOGIC_VECTOR(1 TO 7) ) ;
END seg7 ;
ARCHITECTURE Behavior OF seg7 IS
BEGIN
PROCESS ( bcd )
BEGIN
CASE bcd IS
-abcdefg
WHEN "0000" => leds
<= "1111110" ;
WHEN "0001" => leds
<= "0110000" ;
WHEN "0010" => leds
<= "1101101" ;
WHEN "0011" => leds
<= "1111001" ;
WHEN "0100" => leds
<= "0110011" ;
WHEN "0101" => leds
<= "1011011" ;
WHEN "0110" => leds
<= "1011111" ;
WHEN "0111" => leds
<= "1110000" ;
WHEN "1000" => leds
<=
"1111111" ;
WHEN "1001" => leds
<=
"1110011" ;
WHEN OTHERS
=> leds <=
"-------" ;
END CASE ;
END PROCESS ;
END Behavior ;
.. .

207

Describing combinational logic using processes


LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY compare1 IS
PORT ( A, B : IN
AeqB : OUT
END compare1 ;

STD_LOGIC ;
STD_LOGIC ) ;

ARCHITECTURE Behavior OF compare1 IS


BEGIN
PROCESS ( A, B )
BEGIN
AeqB <= '0' ;
IF A = B THEN
AeqB <= '1' ;
END IF ;
END PROCESS ;
END Behavior ;
.. .

208

Incorrect code for combinational logic


- Implied latch (1)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY implied IS
PORT ( A, B : IN
AeqB : OUT
END implied ;

STD_LOGIC ;
STD_LOGIC ) ;

ARCHITECTURE Behavior OF implied IS


BEGIN
PROCESS ( A, B )
BEGIN
IF A = B THEN
AeqB <= '1' ;
END IF ;
END PROCESS ;
END Behavior ;
.. .

209

Incorrect code for combinational logic


- Implied latch (2)

A
B

AeqB

.. .

210

Describing combinational logic using processes


Rules that need to be followed:
1. All inputs to the combinational circuit should be included
in the sensitivity list
2. No other signals should be included
in the sensitivity list
3. None of the statements within the process
should be sensitive to rising or falling edges
4. All possible cases need to be covered in the internal
IF and CASE statements in order to avoid
implied latches

.. .

211

Covering all cases in the IF statement


Using ELSE
IF A = B THEN
AeqB <= '1' ;
ELSE
AeqB <= '0' ;
Using default values
AeqB <= '0' ;
IF A = B THEN
AeqB <= '1' ;

.. .

212

Covering all cases in the CASE statement


Using WHEN OTHERS
CASE y IS
WHEN S1 => Z <= "10";
WHEN S2 => Z <= "01";
WHEN OTHERS => Z <= "00";
END CASE;

CASE y IS
WHEN S1 => Z <= "10";
WHEN S2 => Z <= "01";
WHEN S3 => Z <= "00";
WHEN OTHERS => Z <= --";
END CASE;

Using default values


Z <= "00";
CASE y IS
WHEN S1 => Z <= "10";
WHEN S2 => Z <= "10";
END CASE;
.. .

213

One-dimensional arrays Examples (1)


type word_asc is array(0 to 31) of std_logic;
type word_desc is array(31 downto 0) of std_logic;
..
signal buffer_register: word_desc;
..
buffer_register(6) <= 1;
..
variable tmp : word_asc;
..
tmp(5):= 0;
.. .

214

One-dimensional arrays Examples (2)


type controller_state is (initial, idle, active, error);
type state_counts_imp is array(idle to error) of natural;
type state_counts_exp is array(controller_state range idle
to error) of natural;
type state_counts_full is array(controller_state) of natural;
..
variable counters: state_counts_exp;
..
counters(active) := 0;
..
counters(active) := counters(active) + 1;
.. .

215

Predefined Unconstrained Array Types


Predefined
bit_vector

array of bits

string

array of characters

Defined in the ieee.std_logic_1164 package:


std_logic_vector

array of std_logic_vectors

.. .

216

Predefined Unconstrained Array Types

subtype byte is bit_vector(7 downto 0);


.
variable channel_busy : bit_vector(1 to 4);
.
constant ready_message :string := ready;
.
signal memory_bus: std_logic_vector (31 downto 0);

.. .

217

User-defined Unconstrained Array Types


type sample is array (natural range <>) of
integer;
.
variable long_sample is sample(0 to 255);
.
constant look_up_table_1: sample :=
(127, -45, 63, 23, 76);
.
.. .

218

Array Attributes
Aleft(N)

left bound of index range of dimension N of A

Aright(N)

right bound of index range of dimension N of A

Alow(N)

lower bound of index range of dimension N of A

Ahigh(N)

upper bound of index range of dimension N of A

Arange(N)

index range of dimension N of A

Areverse_range(N) reversed index range of dimension N of A


Alength(N) length of index range of dimension N of A
Aascending(N) true if index range of dimension N of A
is an ascending range, false otherwise
.. .

219

Array Attributes - Examples


type A is array (1 to 4, 31 downto 0);

Aleft(1)

=1

Aright(2)

=0

Alow(1)

=1

Ahigh(2)

= 31

Arange(1)

= 1 to 4

Alength(2)

= 32

Aascending(2)

= false
.. .

220

Subprograms
Include

functions and procedures


Commonly used pieces of code
Can be placed in a library, and then reused and
shared among various projects
Abstract operations that are repeatedly
performed
Type conversions
Use only sequential statements, the same as
processes
221

.. .

Typical locations of subprograms


PACKAGE
PACKAGE BODY

LIBRARY

global
FUNCTION /
PROCEDURE

ENTITY

local for all architectures


of a given entity
ARCHITECTURE
Declarative part

local for a given architecture


.. .

222

Functions basic features


Functions
always return a single value as a result
Are called using formal and actual parameters the same
way as components
never modify parameters passed to them
parameters can only be constants (including generics) and
signals (including ports);
variables are not allowed; the default is a CONSTANT
when passing parameters, no range specification should be
included (for example no RANGE for INTEGERS, or
TO/DOWNTO for STD_LOGIC_VECTOR)
are always used in some expression, and not called on their
own
.. .

223

.. .

224

Function syntax
FUNCTION function_name
(<parameter_list>)
RETURN data_type IS
[declarations]
BEGIN
(sequential statements)
END function_name;

Function parameters - example


FUNCTION f1
(a, b: INTEGER; SIGNAL c: STD_LOGIC_VECTOR)
RETURN BOOLEAN IS
BEGIN
(sequantial statements)
END f1;

.. .

225

Function calls - examples


x <= conv_integer(a);
IF x > maximum(a, b) THEN ....
WHILE minimum(a, b) LOOP
.......

.. .

226

Function Example 1
LIBRARY ieee;
USE ieee.std_logic_1164.all;
PACKAGE my_package IS
FUNCTION log2_ceil (CONSTANT s: INTEGER) RETURN INTEGER;
END my_package;
PACKAGE body my_package IS
FUNCTION log2_ceil (CONSTANT s: INTEGER) RETURN INTEGER IS
VARIABLE m,n : INTEGER;
BEGIN
m := 0;
n := 1;
WHILE (n < s) LOOP
m := m + 1;
n := n*2;
END LOOP;
RETURN m;
END log2_ceil;
END my_package;
.. .

227

Function call Example 1


LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_unsigned.all;
USE work.my_package.all;
ENTITY log2_int IS
GENERIC (m: INTEGER :=20);
PORT (x: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
y: OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
);
END log2_int;
ARCHITECTURE log2_int OF log2_int IS
CONSTANT l2m : INTEGER := log2_ceil (m);
SIGNAL
r:
STD_LOGIC_VECTOR(3 DOWNTO 0);
BEGIN
r <= conv_std_logic_vector(l2m,4);
y <= x*r;
END log2_int;
.. .

228

Function Example 2
library IEEE;
use IEEE.std_logic_1164.all;
ENTITY powerOfFour IS
PORT(
X
: IN INTEGER;
Y
: OUT INTEGER;
);
END powerOfFour;

.. .

229

.. .

230

Function Example 2
ARCHITECTURE behavioral OF powerOfFour IS
FUNCTION Pow ( SIGNAL N:INTEGER; Exp : INTEGER)
RETURN INTEGER IS
VARIABLE Result : INTEGER := 1;
BEGIN
FOR i IN 1 TO Exp LOOP
Result := Result * N;
END LOOP;
RETURN( Result );
END Pow;
BEGIN
Y <= Pow(X, 4);
END behavioral;

Package containing a function (1)


LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
PACKAGE specialFunctions IS
FUNCTION Pow( SIGNAL N: INTEGER; Exp : INTEGER)
RETURN INTEGER;
END specialFunctions

.. .

231

Package containing a function (2)


PACKAGE BODY specialFunctions IS
FUNCTION Pow(SIGNAL N: INTEGER; Exp : INTEGER)
RETURN INTEGER IS
VARIABLE Result : INTEGER := 1;
BEGIN
FOR i IN 1 TO Exp LOOP
Result := Result * N;
END LOOP;
RETURN( Result );
END Pow;
END specialFunctions

.. .

232

Type conversion function (1)

LIBRARY ieee;
USE ieee.std_logic_1164.all;
------------------------------------------------------------------------------------------------PACKAGE my_package IS
FUNCTION conv_integer (SIGNAL vector: STD_LOGIC_VECTOR)
RETURN INTEGER;
END my_package;
-------------------------------------------------------------------------------------------------

.. .

233

Type conversion function (2)


PACKAGE BODY my_package IS
FUNCTION conv_integer (SIGNAL vector: STD_LOGIC_VECTOR)
RETURN INTEGER;
VARIABLE result: INTEGER RANGE 0 TO 2**vectorLENGTH - 1;
VARIABLE carry: STD_LOGIC;
BEGIN
IF(vector(vectorHIGH)=1 THEN result:=1;
ELSE result := 0;
FOR i IN (vectorHIGH-1) DOWNTO (vectorLOW) LOOP
result := result*2;
IF (vector(i) = 1) THEN result := result+1;
END IF;
RETURN result;
END conv_integer;
END my_package;

.. .

234

Type conversion function (3)


LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE work.my_package.all;
------------------------------------------------------------------------------------------------ENTITY conv_int2 IS
PORT ( a: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
y: OUT INTEGER RANGE 0 TO 15);
END conv_int2;
------------------------------------------------------------------------------------------------ARCHITECTURE my_arch OF conv_int2 IS
BEGIN
y <= conv_integer(a);
END my_arch;

.. .

235

Procedures basic features


Procedures
do not return a value
are called using formal and actual parameters the same way
as components
may modify parameters passed to them
each parameter must have a mode: IN, OUT, INOUT
parameters can be constants (including generics), signals
(including ports), and variables;
the default for inputs (mode in) is a constant, the default for
outputs (modes out and inout) is a variable
when passing parameters, range specification should be
included (for example RANGE for INTEGERS, and
TO/DOWNTO for STD_LOGIC_VECTOR)
Procedure calls are statements on their own
.. .

236

Procedure syntax
PROCEDURE procedure_name
(<parameter_list>) IS
[declarations]
BEGIN
(sequential statements)
END function_name;

.. .

237

Procedure parameters - example


FUNCTION f1
(a, b: INTEGER; SIGNAL c: STD_LOGIC_VECTOR)
RETURN BOOLEAN IS
BEGIN
(sequantial statements)
END f1;

.. .

238

Procedure calls - examples


compute_min_max(in1, in2, in3, out1, out2);
divide(dividend, divisor, quotient, remainder);
IF (a > b) THEN
compute_min_max(in1, in2, in3, out1, out2);
.......

.. .

239

Procedure example (1)


LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE work.decProcs.all;
ENTITY decoder IS port (
decIn: IN STD_LOGIC_VECTOR(1 DOWNTO 0);
decOut: OUT STD_LOGIC_VECTOR(3 DOWNTO 0)
);
END decoder;

.. .

240

Procedure example (2)


ARCHITECTURE simple OF decoder IS
PROCEDURE DEC2x4 (inputs : in STD_LOGIC_VECTOR(1 downto 0);
decode: out STD_LOGIC_VECTOR(3 downto 0)
) IS
BEGIN
CASE inputs IS
WHEN "11" =>
decode := "1000";
WHEN "10" =>
decode := "0100";
WHEN "01" =>
decode := "0010";
WHEN "00" =>
decode := "0001";
WHEN others =>
decode := "0001";
END case;
END DEC2x4;
BEGIN
DEC2x4(decIn,decOut);
END simple;
.. .

241

Operator as a function (1)


LIBRARY ieee;
USE ieee.std_logic_1164.al;
------------------------------------------------------------------------------------------------PACKAGE my_package IS
FUNCTION "+" (a, b: STD_LOGIC_VECTOR)
RETURN STD_LOGIC_VECTOR;
END my_package;
-------------------------------------------------------------------------------------------------

.. .

242

Operator as a function (2)


PACKAGE BODY my_package IS
FUNCTION "+" (a, b: STD_LOGIC_VECTOR)
RETURN STD_LOGIC_VECTOR;
VARIABLE result: STD_LOGIC_VECTOR;
VARIABLE carry: STD_LOGIC;
BEGIN
carry := 0;
FOR i IN aREVERSE_RANGE LOOP
result(i) := a(i) XOR b(i) XOR carry;
carry := (a(i) AND b(i)) OR (a(i) AND carry) OR (b(i) AND carry));
END LOOP;
RETURN result;
END "+" ;
END my_package;

.. .

243

Operator overloading
Operator overloading allows different
argument types for a given operation
(function)
The VHDL tools resolve which of these
functions to select based on the types of
the inputs
This selection is transparent to the user as
long as the function has been defined for
the given argument types.
.. .

244

Different declarations for the same operator Example


Declarations in the package ieee.std_logic_unsigned:
function + ( L: std_logic_vector;
R:std_logic_vector)
return std_logic_vector;
function + ( L: std_logic_vector;
R: integer)
return std_logic_vector;
function + ( L: std_logic_vector;
R:std_logic)
return std_logic_vector;

.. .

245

Different declarations for the same operator Example


signal count: std_logic_vector(7 downto 0);
You can use:
count <= count + 0000_0001;
or
count <= count + 1;
or
count <= count + 1;
.. .

246

Notion of type
Type defines a set of values and a set of
applicable operations
Declaration of a type determines which values
can be stored in an object (signal, variable,
constant) of a given type
Every object can only assume values of its
nominated type
Each operation (e.g., and, +, *) includes the types
of values to which the operation may be applied,
and the type of the result
The goal of strong typing is a detection of errors
at an early stage of the design process
.. .

247

Example of strong typing


architecture incorrect of example1 is
type apples is range 0 to 100;
type oranges is range 0 to 100;
signal apple1: apples;
signal orange1: oranges;
begin
apple1 <= orange1;
end incorrect;
.. .

248

Integer type
Name:
Status:
Contents:

integer
predefined
all integer numbers
representable on a
particular host computer,
but at least numbers in the
range
(231-1) .. 231-1

.. .

249

User defined integer types - Examples


type day_of_month is range 0 to 31;
type year is range 0 to 2100;
type set_index_range is range 999 downto 100;
constant number_of_bits: integer :=32;
type bit_index is range 0 to number_of_bits-1;
Values of bounds can be expressions, but
need to be known when the model is analyzed.
.. .

250

Predefined enumeration types (1)


boolean

(true, false)

bit

(0, 1)

character

VHDL-87:
128 7-bit ASCII characters
VHDL-93:
256 ISO 8859 Latin-1 8-bit characters

.. .

251

Predefined enumeration types (2)


severity_level

(note, warning, error, failure)

Predefined in VHDL-93 only:


file_open_kind
(read_mode, write_mode, append_mode)
file_open_status
(open_ok, status_error,
name_error, mode_error)
.. .

252

User-defined enumeration types Examples


type state is (S0, S1);
type alu_function is (disable, pass, add, subtract,
multiply, divide);
type octal_digit is (0, 1, 2, 3, 4, 5, 6, 7);
type mixed is (lf, cr, ht, -, /, \);

Each value in an enumeration type must be either


an identifier or a character literal
.. .

253

Floating point types


Used to represent real numbers
Numbers are represented using a significand
(mantissa) part and an exponent part
Conform to the IEEE standard 754 or 854
Minimum size of representation that must be
supported by the implementation of the VHDL
standard:
VHDL-2001:
64-bit representation
VHDL-87, VHDL-93: 32-bit representation
.. .

254

Real literals - examples


23.1
46E5
1E+12
1.234E09
34.0e-08

23.1
46 105
1 1012
1.234 109
34.0 10-8

2#0.101#E5
8#0.4#E-6
16#0.a5#E-8

0.1012 25 =(2-1+2-3) 25
0.48 8-6 = (4 8-1) 8-6
0.a516 16-8 =(1016-1+516-2) 16-8
.. .

255

The ANSI/IEEE standard floating-point


number representation formats

.. .

256

User-defined floating-point types Examples


type input_level is range -10.0 to +10.0
type probability is range 0.0 to 1.0;
constant max_output: real := 1.0E6;
constant min_output: real := 1.0E-6;
type output_range is max_output downto min_output;

.. .

257

Attributes of all scalar types


Tleft
Tright
Tlow
Thigh

first (leftmost) value in T


last (rightmost) value in T
least value in T
greatest value in T

Not available in VHDL-87:


Tascending
true if T is an ascending range, false
otherwise
Timage(x) a string representing the value of x
Tvalue(s) the value in T that is represented by s
.. .

258

Attributes of all scalar types - examples


type index_range is range 21 downto 11;
index_rangeleft
index_rangeright
index_rangelow
index_rangehigh
index_rangeascending
index_rangeimage(14)
index_rangevalue(20)

= 21
= 11
= 11
= 21
= false
= 14
= 20

.. .

259

Attributes of discrete types


Tpos(x)
Tval(n)
Tsucc(x)
Tpred(x)
Tleftof(x)
Trightof(x)

position number of x in T
value in T at position n
value in T at position one greater
than position of x
value in T at position one less
than position of x
value in T at position one to the
left of x
value in T at position one to the
right of x
.. .

260

Attributes of discrete types - examples


type logic_level is (unknown, low, undriven, high);
logic_levelpos(unknown)
logic_levelval(3)
logic_levelsucc(unknown)
logic_levelpred(undriven)
logic_levelleftof(unknown)
logic_levelrightof(undriven)

=0
= high
= low
= low
error
= high

.. .

261

Subtype
Defines a subset of a base type values
A condition that is used to determine which
values are included in the subtype is called
a constraint
All operations that are applicable to the
base type also apply to any of its subtypes
Base type and subtype can be mixed in the
operations, but the result must belong to
the subtype, otherwise an error is
generated.
.. .

262

Predefined subtypes
natural

integers t 0

positive

integers > 0

Not predefined in VHDL-87:


delay_length

time t 0

.. .

263

User-defined subtypes - Examples

subtype bit_index is integer range 31 downto 0;


subtype input_range is real range 1.0E-9 to
1.0E+12;

.. .

264

Operators (1)

.. .

265

.. .

266

Operators (2)

Operators (3)

.. .

267

Propagation delay in VHDL - Example


entity MAJORITY is
port
(A_IN, B_IN, C_IN : in
STD_LOGIC;
Z_OUT
: out STD_LOGIC);
end MAJORITY;
architecture DATA_FLOW of MAJORITY is
begin
Z_OUT <= (not A_IN and B_IN and C_IN) or
(A_IN and not B_IN and C_IN) or
(A_IN and B_IN and not C_IN) or
(A_IN and B_IN and C_IN) after 20 ns;
end DATA_FLOW;
.. .

268

Propagation delay - Example

.. .

269

Inertial delay model


Short pulses (spikes) are not passed to the
outputs of logic gates due to the inertia of
physical systems.
Logic gates behave like low pass filters and
effectively filter out high frequency input
changes as if they never occurred.

.. .

270

Inertial delay model - Example


SIG_OUT <= not SIG_IN after 7 ns

.. .

271

VHDL-87 Inertial delay model


Any input signal change that does not persist
for at least a propagation delay of the device
is not reflected at the output.
inertial delay (pulse rejection limit) =
propagation delay

.. .

272

VHDL-93 Enhanced inertial delay model


VHDL-93 allows the inertial delay model to be declared
explicitly as well as implicitly.
Explicitly:
Z_OUT <= inertial (not A_IN and B_IN and C_IN) or
(A_IN and not B_IN and C_IN) or
(A_IN and B_IN and not C_IN) or
(A_IN and B_IN and C_IN) after 20 ns;
Implicitly:
Z_OUT <= (not A_IN and B_IN and C_IN) or
(A_IN and not B_IN and C_IN) or
(A_IN and B_IN and not C_IN) or
(A_IN and B_IN and C_IN) after 20 ns;
.. .

273

VHDL-93 Enhanced inertial delay model

VHDL-93 allows inertial delay, also called


a pulse rejection limit, to be different from the
propagation delay.
SIG_OUT <= reject 5 ns inertial not SIG_IN after 7 ns;

.. .

274

Transport delay model


With a transport delay model, all input signal
changes are reflected at the output, regardless of
how long the signal changes persist.
Transport delay model must be declared explicitly using the
keyword transport.
Inertial delay model is a default delay model because it
reflects better the actual behavior of logic components.
Transport delay model is used for high-level modeling.
.. .

275

Transport delay model - Example


SIG_OUT <= transport not SIG_IN after 7 ns

.. .

276

Event-driven simulation

time

List of events scheduled


to occur at time tq

signal
new value
.. .

277

Event list as an array Timing wheel


no events
time

List of events scheduled


to occur at time tc

signal
new value
.. .

278

Delta delay
A propagation delay of 0 time units is
equivalent to omitting the after clause and is
called a delta delay.
Used for functional simulation.

.. .

279

Two-dimensional aspect of time

.. .

280

Simulation engine algorithm


while (event list not empty)
begin
t = next time in list
process entries for time t
end

If next time in list


= previous time
then the previous
iteration of the
loop has advanced
time by one
delta delay

.. .

281

Signals vs Variables
architecture DUMMY_1 of JUNK is
signal Y : bit := 0;
begin
process
variable X : bit := 0;
begin
wait for 10 ns;
X := 1;
Y <= X;
wait for 10 ns;
-- What is Y at this point ? 1
...
end process;
end DUMMY_1;

architecture DUMMY_2 of JUNK is


signal X, Y : bit := 0;
begin
process
begin
wait for 10 ns;
X <= 1;
Y <= X;
wait for 10 ns;
-- What is Y at this point ? 0
...
end process;
end DUMMY_2;

Variable assignment is immediate; signal assignment


with 0 delay take effect only after a delta delay. i.e., in
the next simulation cycle.
.. .

282

Properties of signals
Signals represent a time-ordered list of values
denoting past, present and future values.
This time history of a signal is called a waveform.
A value/time pair (v, t) is called a transaction.
If a transaction changes value of a signal, it is
called an event.

.. .

283

Signal attributes (1)


Stransaction - a signal of type bit that changes
value from 0 to 1 or vice versa each time there
is a transaction on S.
Sevent - True if there is an event on S in the
current simulation cycle, false otherwise.
Sactive True if there is a transaction on S in a
given simulation cycle, false otherwise.

.. .

284

Signal attributes (2)


Slast_event
event on S.

- The time interval since the last

Slast_active - The time interval since the last


transaction on S.
Slast_value The value of S just before the last
event on S.

.. .

285

Signal attributes (3)


Sdelayed(T)
- A signal that takes on the same
value as S, but is delayed by time T.
Sstable(T) - A Boolean signal that is true if there
has been no event on S in the time interval T up
to the current time, otherwise false.
Squiet(T) A Boolean signal that is true if there
has been no transaction on S in the time interval
T up to the current time, otherwise false.
.. .

286

You might also like