You are on page 1of 56

VHDL Coding

Exercise 4: FIR Filter


Where to start?
Designspace
Feedback
Exploration

Algorithm Architecture

Optimization

RTL-
VHDL-Code
Block diagram
Algorithm
• High-Level System Diagram
 Context of the design
 Inputs and Outputs
 Throughput/rates
 Algorithmic requirements

y  k    bi x k  i 
N
• Algorithm Description
 Mathematical Description
i 0
 Performance Criteria
x k  y k 
 Accuracy FIR
 Optimization constraints
 Implementation constraints
 Area
 Speed
Architecture (1)
• Isomorphic Architecture:
 Straight forward implementation of the algorithm

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Insert register(s) at the inputs or outputs
 Increases Latency
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Insert register(s) at the inputs or outputs
 Increases Latency
 Perform Retiming: Backwards:
 Move registers through the logic
without changing functionality Forward:
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Insert register(s) at the inputs or outputs
 Increases Latency
 Perform Retiming: Backwards:
 Move registers through the logic
without changing functionality Forward:
Architecture (2)
• Pipelining/Retiming:
 Improve timing

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Insert register(s) at the inputs or outputs
 Increases Latency
 Perform Retiming: Backwards:
 Move registers through the logic
without changing functionality Forward:
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (3)
• Retiming and simple transformation:
 Optimization

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Reverse the adder chain
 Perform Retiming
Architecture (4)
• More pipelining:
 Add one pipelining stage to the retimed circuit

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 The longest path is given by the multiplier
 Unbalanced: The delay from input to the first pipeline stage is
much longer than the delay from the first to the second stage
Architecture (5)
• More pipelining:
 Add one pipelining stage to the retimed circuit

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Move the pipeline registers into the multiplier:
 Paths between pipeline stages are balanced
 Improved timing
 Tclock = (Tadd + Tmult)/2 + Treg
Architecture (6)
• Iterative Decomposition:
 Reuse Hardware

x k 

b0 b1 b2 bN 2 bN 1 bN

y k 
 Identify regularity and reusable hardware components
 Add control
x k 
 multiplexers
 storage elements
 Control
0
 Increases Cycles/Sample
b0 y k 
bN
RTL-Design
• Choose an architecture under the following constraints:
 It meets ALL timing specifications/constraints:
 Throughput Iterative
 Latency Decomposition
 It consumes the smallest possible area
 It requires the least possible amount of power

• Decide which additional functions are needed and


how they can be implemented efficiently:
 Storage of samples x(k) => MEMORY x k 
 Storage of coefficients bi => LUT
 Address generators for MEMORY and LUT 0
=> COUNTERS b 0 y k 

 Control => FSM b N


RTL-Design
• RTL Block-diagram:N
 Datapath y  k    bi x k  i 
i 0
x k 

0
b0 y k 
bN

• FSM:
 Interface protocols
datapath control:
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
y  k    bi x k  i 
N


i 0
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
   
N

 y k   bi x k  i
i 0
 Store result to output register
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
   
N

 y k   bi x k  i
i 0
 Store result to output register
 DATA OUT:
 Output result
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
   
N

 y k   bi x k  i
i 0
 Store result to output register
 DATA OUT:
 Output result / Wait for ACK
RTL-Design
• How it works: y  k    bi x k  i 
N

i 0
 IDLE
 Wait for new sample
 Store to input register
 NEW DATA:
 Store new sample to memory
 RUN:
   
N

 y k   bi x k  i
i 0
 Store result to output register
 DATA OUT:
 Output result / Wait for ACK
 IDLE: …
Translation into VHDL
• Some basic VHDL building blocks:
 Signal Assignments:
 Outside a process:
AxD YxD

• This is NOT allowed !!!


AxD YxD
BxD

 Within a process (sequential execution):


AxD • Sequential execution
YxD • The last assignment is
BxD
kept when the process
terminates
Translation into VHDL
• Some basic VHDL building blocks:
 Multiplexer:
AxD
BxD YxD
CxD Default
SELxS Assignment
 Conditional Statements:
AxD

BxD

SelAxS OUTxD

CxD

DxD

SelBxS

STATExDP
Translation into VHDL
• Common mistakes with conditional statements:
 Example:
AxD

??
• NO default assignment
SelAxS OUTxD

BxD

?? • NO else statement

SelBxS

STATExDP

• ASSIGNING NOTHING TO A SIGNAL IS NOT A


WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!
Translation into VHDL
• Some basic VHDL building blocks:
 Register:
DataREGxDN DataREGxDP

 Register with ENABLE:


DataREGxDN DataREGxDP

DataREGxDN DataREGxDP
Translation into VHDL
• Common mistakes with sequential processes:
DataREGxDN DataREGxDP

CLKxCI

DataRegENxS
• Can not be translated
into hardware and is
NOT allowed

DataREGxDN DataREGxDP

0
1
• Clocks are NEVER
generated within
any logic

DataREGxDN DataREGxDP

CLKxCI
• Gated clocks are more
complicated then this
• Avoid them !!!
DataRegENxS
Translation into VHDL
• Some basic rules:
 Sequential processes (FlipFlops)
 Only CLOCK and RESET in the sensitivity list
 Logic signals are NEVER used as clock signals
 Combinatorial processes
 Multiple assignments to the same signal are ONLY possible within
the same process => ONLY the last assignment is valid
 Something must be assigned to each signal in any case OR
There MUST be an ELSE for every IF statement
• More rules that help to avoid problems and surprises:
 Use separate signals for the PRESENT state and the
NEXT state of every FlipFlop in your design.
 Use variables ONLY to store intermediate results or even
avoid them whenever possible in an RTL design.
Translation into VHDL
• Write the ENTITY definition of your design to specify:
 Inputs, Outputs and Generics
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:

Register with ENABLE

Register with ENABLE


Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:

Register with CLEAR


Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:

Counter

Counter
Translation into VHDL
• Describe the functional units in your block diagram
one after another in the architecture section:
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process

MEALY
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process

MEALY
Translation into VHDL
• The FSM is described with one sequential process
and one combinatorial process

MEALY
Translation into VHDL
• Complete and check the code:
 Declare the signals and components

 Check and complete the sensitivity lists of ALL combinatorial


processes with ALL signals that are:
 used as condition in any IF or CASE statement
 being assigned to any other signal
 used in any operation with any other signal

 Check the sensitivity lists of ALL sequential processes that they


 contain ONLY one global clock and one global async. reset signal
 no other signals
Other Good Ideas
• Keep things simple
• Partition the design (Divide et Impera):
 Example:
Start processing the next sample, while the previous
result is waiting in the output register:
 Just add a FIFO to at the output of you filter
• Do NOT try to optimize each Gate or FlipFlop
• Do not try to save cycles if not necessary
• VHDL code
 Is usually long and that is good !!
 Is just a representation of your block diagram
 Does not mind hierarchy

You might also like