You are on page 1of 104

E&CE 327: Digital Systems Engineering

Course Notes
(with Solutions)

2015t1 (Winter)
Instructor: Rodolfo Pellizzoni
Notes by:

Mark Aagaard

University of Waterloo
Dept of Electrical and Computer Engineering

ECE-327: 2015t1 (Winter)0.0 1

ii

Contents
1

Fundamentals of VHDL
1.1 Introduction to VHDL . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Levels of Abstraction . . . . . . . . . . . . . . . . . . . .
1.1.2 VHDL Origins and History . . . . . . . . . . . . . . . . .
1.1.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4 Synthesis of a Simulation-Based Language . . . . . . . .
1.1.5 Solution to Synthesis Sanity . . . . . . . . . . . . . . . .
1.1.6 Standard Logic 1164 . . . . . . . . . . . . . . . . . . . .
1.2 Comparison of VHDL to Other Hardware Description Languages
1.2.1 VHDL Disadvantages . . . . . . . . . . . . . . . . . . .
1.2.2 VHDL Advantages . . . . . . . . . . . . . . . . . . . . .
1.2.3 VHDL and Other Languages . . . . . . . . . . . . . . . .
1.2.3.1 VHDL vs Verilog . . . . . . . . . . . . . . . .
1.2.3.2 VHDL vs System Verilog . . . . . . . . . . . .
1.2.3.3 VHDL vs SystemC . . . . . . . . . . . . . . .
1.2.3.4 Summary of VHDL Evaluation . . . . . . . . .
1.3 Overview of Syntax . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Syntactic Categories . . . . . . . . . . . . . . . . . . . .
1.3.2 Library Units . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Entities and Architecture . . . . . . . . . . . . . . . . . .
1.3.4 Concurrent Statements . . . . . . . . . . . . . . . . . . .
1.3.5 Component Declaration and Instantiations . . . . . . . . .
1.3.6 Processes . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.7 Sequential Statements . . . . . . . . . . . . . . . . . . .
1.3.8 A Few More Miscellaneous VHDL Features . . . . . . .
1.4 Concurrent vs Sequential Statements . . . . . . . . . . . . . . . .
1.4.1 Concurrent Assignment vs Process . . . . . . . . . . . . .
1.4.2 Conditional Assignment vs If Statements . . . . . . . . .
1.4.3 Selected Assignment vs Case Statement . . . . . . . . . .
1.4.4 Coding Style . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Overview of Processes . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Combinational Process vs Clocked Process . . . . . . . .
1.5.2 Latch Inference . . . . . . . . . . . . . . . . . . . . . . .
1.5.3 Combinational vs Flopped Signals . . . . . . . . . . . . .
iii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

13
13
13
14
16
17
18
18
19
19
19
20
20
20
20
21
21
21
22
22
25
27
27
29
30
30
30
30
31
31
32
34
35
37

CONTENTS

1.6

VHDL Execution: Delta-Cycle Simulation . . . . . . . . . . . . . . . . . . . . .


1.6.1 Simple Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.2 Temporal Granularities of Simulation . . . . . . . . . . . . . . . . . . .
1.6.3 Zero-Delay Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.4 Intuition Behind Delta-Cycle Simulation . . . . . . . . . . . . . . . . .
1.6.4.1 Introduction to Delta-Cycle Simulation . . . . . . . . . . . . .
1.6.4.2 Intuitive Rules for Delta-Cycle Simulation . . . . . . . . . . .
1.6.4.3 Example of Delta-Cycles: Back-to-Back Buffers . . . . . . . .
1.6.4.4 Example of Projected Assignment: Back-to-Back Buffers . . .
1.6.4.5 Example of Projected Assignment: Back-to-Back Flip-Flops .
1.6.4.6 Example of Projected Assignment with Combinational Loop .
1.6.5 VHDL Delta-Cycle Simulation . . . . . . . . . . . . . . . . . . . . . . .
1.6.5.1 Informal Description of Algorithm . . . . . . . . . . . . . . .
1.6.5.2 Example: VHDL Simulation of Back-to-Back Buffers . . . . .
1.6.5.3 Definitions and Algorithm . . . . . . . . . . . . . . . . . . . .
1.6.5.4 Example: Delta-Cycle Simulation of Back-to-Back Flip-Flops
1.6.5.5 Example: VHDL Simulation of Combinational Loop . . . . .
1.6.5.6 Rules and Observations for Drawing Delta-Cycle Simulations .
1.6.6 External Inputs and Flip-Flops . . . . . . . . . . . . . . . . . . . . . . .
1.7 Register-Transfer-Level Simulation . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.2 Technique for Register-Transfer Level Simulation . . . . . . . . . . . . .
1.7.3 Examples of RTL Simulation . . . . . . . . . . . . . . . . . . . . . . . .
1.7.3.1 RTL Simulation Example 1 . . . . . . . . . . . . . . . . . . .
1.8 Simple RTL Simulation in Software . . . . . . . . . . . . . . . . . . . . . . . .
1.8.1 Introductory Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8.2 Regs and Comb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8.3 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8.4 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9 Variables in VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9.1 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9.2 Usage of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 Delta-Cycle Simulation with Delays . . . . . . . . . . . . . . . . . . . . . . . .
1.10.1 Transport and Inertial Delay . . . . . . . . . . . . . . . . . . . . . . . .
1.10.2 Delayed Assignment Semantics . . . . . . . . . . . . . . . . . . . . . .
1.10.3 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10.4 Waveform Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11 VHDL and Hardware Building Blocks . . . . . . . . . . . . . . . . . . . . . . .
1.11.1 Basic Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11.2 Deprecated Building Blocks for RTL . . . . . . . . . . . . . . . . . . .
1.11.2.1 An Aside on Flip-Flops and Latches . . . . . . . . . . . . . .
1.11.2.2 Deprecated Hardware . . . . . . . . . . . . . . . . . . . . . .
1.11.3 Hardware and Code for Flops . . . . . . . . . . . . . . . . . . . . . . .
1.11.3.1 Flops with Waits and Ifs . . . . . . . . . . . . . . . . . . . . .
1.11.3.2 Flops with Synchronous Reset . . . . . . . . . . . . . . . . .

iv

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

37
37
38
39
40
40
41
42
43
43
45
48
48
50
51
52
55
56
57
60
60
62
63
63
66
66
67
70
71
73
74
74
75
75
76
79
83
84
84
85
85
85
86
86
86

CONTENTS

1.11.3.3 Flops with Chip-Enable . . . . . . . . . .


1.11.3.4 Flop with Chip-Enable and Mux on Input .
1.11.3.5 Flops with Chip-Enable, Muxes, and Reset
1.11.4 An Example Sequential Circuit . . . . . . . . . . .
1.12 Synthesizable vs Non-Synthesizable Code . . . . . . . . . .
1.12.1 Initial Values . . . . . . . . . . . . . . . . . . . . .
1.12.2 Wait For . . . . . . . . . . . . . . . . . . . . . . . .
1.12.3 Variables . . . . . . . . . . . . . . . . . . . . . . .
1.12.4 Bits and Booleans . . . . . . . . . . . . . . . . . .
1.12.5 Assignments before Wait Statement . . . . . . . . .
1.12.6 Different Wait Conditions . . . . . . . . . . . . . .
1.12.7 Multiple if rising edge in Process . . . . . . . . .
1.12.8 if rising edge and wait in Same Process . . . . .
1.12.9 if rising edge with else Clause . . . . . . . . . .
1.12.10 Loop with Both Comb and Clocked Paths . . . . . .
1.12.11 wait Inside of a for loop . . . . . . . . . . . . .
1.13 Guidelines for Desirable Hardware . . . . . . . . . . . . . .
1.13.1 Know Your Hardware . . . . . . . . . . . . . . . .
1.13.2 Latches . . . . . . . . . . . . . . . . . . . . . . . .
1.13.3 Asynchronous Reset . . . . . . . . . . . . . . . . .
1.13.4 Combinational Loops . . . . . . . . . . . . . . . . .
1.13.5 Using a Data Signal as a Clock . . . . . . . . . . . .
1.13.6 Using a Clock Signal as Data . . . . . . . . . . . . .
1.13.7 Tri-State Buffers and Signals . . . . . . . . . . . . .
1.13.8 Multiple Drivers . . . . . . . . . . . . . . . . . . .
2

Additional Features of VHDL


2.1 Literals . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Numeric Literals . . . . . . . . . . . . . . .
2.1.2 Bit-String Literals . . . . . . . . . . . . . . .
2.2 Arrays and Vectors . . . . . . . . . . . . . . . . . .
2.2.1 Declarations . . . . . . . . . . . . . . . . . .
2.2.2 Indexing, Slicing, Concatenation, Aggregates
2.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Arithmetic Packages . . . . . . . . . . . . .
2.3.2 Arithmetic Types . . . . . . . . . . . . . . .
2.3.3 Overloading of Arithmetic . . . . . . . . . .
2.3.4 Widths for Addition and Subtraction . . . . .
2.3.5 Overloading of Comparisons . . . . . . . . .
2.3.6 Widths for Comparisons . . . . . . . . . . .
2.3.7 Type Conversion . . . . . . . . . . . . . . .
2.3.8 Shift and Rotate Operations . . . . . . . . . .
2.4 Types . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Enumerated Types . . . . . . . . . . . . . . .
2.4.2 Defining New Array Types . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

87
87
88
88
92
93
93
93
93
94
94
94
96
96
96
98
98
99
100
101
101
102
102
103
104

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

143
144
144
145
146
146
148
151
151
152
153
154
156
157
158
162
163
163
164

CONTENTS

Overview of FPGAs
3.1 Generic FPGA Hardware . . . . . . . . . .
3.1.1 Generic FPGA Cell . . . . . . . . .
3.1.2 Lookup Table . . . . . . . . . . . .
3.1.3 Interconnect for Generic FPGA . . .
3.1.4 Blocks of Cells for Generic FPGA .
3.1.5 Special Circuitry in FPGAs . . . . .
3.2 Area Estimation for FPGAs . . . . . . . . .
3.2.1 Area for Circuit with one Target . .
3.2.2 Algorithm to Allocate Gates to Cells
3.2.3 Area for Arithmetic Circuits . . . .

vi

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

165
165
166
170
171
174
176
177
178
181
186

Intro to RTL Design with VHDL


4.1 Function Tables . . . . . . . . . . . . . . . . . . . .
4.1.1 Karnaugh Maps . . . . . . . . . . . . . . . .
4.1.2 Generalizations . . . . . . . . . . . . . . . .
4.1.3 Multi-Output Tables . . . . . . . . . . . . . .
4.1.4 Dont-Cares . . . . . . . . . . . . . . . . . .
4.1.5 Dont Cares on Inputs . . . . . . . . . . . . .
4.1.6 Consistency and Unused . . . . . . . . . .
4.2 Finite State Machines in VHDL . . . . . . . . . . .
4.2.1 HDL Coding Styles for State Machines . . .
4.2.2 State Encodings . . . . . . . . . . . . . . . .
4.2.3 Traditional State-Machine Notation . . . . .
4.2.4 Our State-Machine Notation . . . . . . . . .
4.2.5 Bounce Example . . . . . . . . . . . . . . .
4.2.6 Registered Assignments . . . . . . . . . . . .
4.2.7 Summary and Analysis of Explicit vs Implicit
4.2.8 More Notation . . . . . . . . . . . . . . . . .
4.2.9 Semantic and Syntax Rules . . . . . . . . . .
4.2.10 VHDL Constructs and Patterns . . . . . . .
4.2.11 Translating VHDL to FSM . . . . . . . . .
4.2.12 Reset . . . . . . . . . . . . . . . . . . . . .
4.3 Dataflow Diagrams . . . . . . . . . . . . . . . . . .
4.3.1 Dataflow Diagrams Overview . . . . . . . . .
4.3.2 Dataflow Diagram Execution . . . . . . . . .
4.3.3 Dataflow Diagrams, Hardware, and Behaviour
4.3.4 Performance Estimation . . . . . . . . . . . .
4.3.5 Area Estimation . . . . . . . . . . . . . . . .
4.3.6 Design Analysis . . . . . . . . . . . . . . . .
4.3.7 Parcels . . . . . . . . . . . . . . . . . . . . .
4.3.8 Bubbles and Throughput . . . . . . . . . . .
4.4 Hnatyshyn with Registered Outputs . . . . . . . . .
4.4.1 Leftovers . . . . . . . . . . . . . . . . . . .
4.4.2 Design Process . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

191
192
192
193
194
195
196
197
202
202
203
204
205
206
211
215
216
223
226
230
231
233
233
236
240
244
245
247
251
252
259
259
263

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

vii

CONTENTS

4.4.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.4 Data-Dependency Graph . . . . . . . . . . . . . . . . . . .
4.4.5 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . .
4.4.6 Area Optimization . . . . . . . . . . . . . . . . . . . . . .
4.4.7 Assign Names to Registered Signals . . . . . . . . . . . . .
4.4.8 VHDL #1: Big and Obviously Correct . . . . . . . . . . . .
4.4.9 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.10 VHDL #2: Post-Allocation . . . . . . . . . . . . . . . . .
4.4.11 Explicit State Machine . . . . . . . . . . . . . . . . . . . .
4.4.12 VHDL Implementation #3 . . . . . . . . . . . . . . . . . .
4.5 Design Example: Hnatyshyn with Combinational Inputs and Outputs
4.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . .
4.5.3 Maximum Throughput Design . . . . . . . . . . . . . . . .
4.5.4 Minimum Area Design . . . . . . . . . . . . . . . . . . . .
4.5.5 Minimum Area Design with ASAP Parcels . . . . . . . . .
4.5.6 Minimum Area Design with Unpredictable Bubbles . . . . .
4.6 Hnatyshyn with Registered Inputs and Combinational Output . . . .
4.6.1 Dataflow Diagram and Behaviour . . . . . . . . . . . . . .
4.7 Hnatyshyn with Registered Inputs and Outputs . . . . . . . . . . . .
4.7.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Data-Dependency Graph . . . . . . . . . . . . . . . . . . .
4.7.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . .
4.7.4 Area Optimization . . . . . . . . . . . . . . . . . . . . . .
4.7.5 Assign Names to Registered Signals . . . . . . . . . . . . .
4.7.6 VHDL #1: Big and Obviously Correct . . . . . . . . . . . .
4.7.7 Tangent: Combinational Outputs . . . . . . . . . . . . . . .
4.7.8 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.9 VHDL #2: Post-Allocation . . . . . . . . . . . . . . . . . .
4.7.10 Separate Datapath and Control . . . . . . . . . . . . . . .
4.8 Design Example: Hnatyshyn with Bubbles . . . . . . . . . . . . . .
4.8.1 Control Table Standard Method . . . . . . . . . . . . . .
4.8.2 Control Table Valid Bit Shortcut . . . . . . . . . . . . .
4.9 Example: LeBlanc . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9.1 System Description . . . . . . . . . . . . . . . . . . . . . .
4.9.2 Design for ASAP Parcels . . . . . . . . . . . . . . . . . . .
4.9.2.1 Implicit State Machine . . . . . . . . . . . . . . .
4.9.2.2 Explicit State Machine . . . . . . . . . . . . . . .
4.9.2.3 Datapath Control . . . . . . . . . . . . . . . . . .
4.9.2.4 Final Implementation . . . . . . . . . . . . . . . .
4.9.2.5 Buggy Implementation . . . . . . . . . . . . . . .
4.9.3 Design for Unpredictable Bubbles . . . . . . . . . . . . . .
4.9.3.1 Implicit State Machine . . . . . . . . . . . . . . .
4.9.3.2 Explicit State Machine . . . . . . . . . . . . . . .
4.9.3.3 Datapath Control . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

265
266
267
268
269
271
273
279
281
289
293
295
296
301
306
308
311
316
316
325
326
327
328
329
330
332
334
335
341
343
354
360
365
371
372
378
382
385
387
388
389
391
393
396
401

CONTENTS

viii

4.9.3.4 Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 408


4.9.3.5 Buggy Implementation . . . . . . . . . . . . . . . . . . . . . . . . 409
5

Intermediate RTL Design


5.1 Inter-Parcel Variables: Hnatyshyn with Internal State .
5.1.1 Requirements and Goals . . . . . . . . . . . .
5.1.2 High-Level Model . . . . . . . . . . . . . . . .
5.1.3 Dataflow Diagrams and Waveforms . . . . . .
5.1.4 Implicit State Machine . . . . . . . . . . . . .
5.1.5 Adding Reset . . . . . . . . . . . . . . . . . .
5.1.6 Control Tables . . . . . . . . . . . . . . . . . .
5.1.7 VHDL Implementation . . . . . . . . . . . . .
5.1.8 Summary of Bubbles and Inter-Parcel Variables
5.2 Hnatyshyn for a Finite Sequence . . . . . . . . . . . .
5.2.1 Introduction to Hnatyshyn-Finite . . . . . . . .
5.2.2 Requirements, Goals, and Constraints . . . . .
5.2.3 Pseudocode . . . . . . . . . . . . . . . . . . .
5.2.4 High-Level Model . . . . . . . . . . . . . . . .
5.2.5 Transient-State Machine . . . . . . . . . . . .
5.2.6 Add Support for Bubbles . . . . . . . . . . . .
5.2.7 Linearize Control Flow . . . . . . . . . . . . .
5.3 Memory Arrays and RTL Design . . . . . . . . . . . .
5.3.1 Memory Operations . . . . . . . . . . . . . . .
5.3.2 Memory Arrays in VHDL . . . . . . . . . . .
5.3.3 Data Dependencies . . . . . . . . . . . . . . .
5.4 Design Example: Massey . . . . . . . . . . . . . . . .
5.5 Design Example: Vanier . . . . . . . . . . . . . . . .
5.5.1 Requirements . . . . . . . . . . . . . . . . . .
5.5.2 Algorithm . . . . . . . . . . . . . . . . . . . .
5.5.3 Initial Dataflow Diagram . . . . . . . . . . . .
5.5.4 Reschedule to Meet Requirements . . . . . . .
5.5.5 Optimization: Reduce Inputs . . . . . . . . . .
5.5.6 Assign Names to Registered Values . . . . . .
5.5.7 VHDL Implementation #1 . . . . . . . . . . .
5.5.8 Tangent: Combinational Outputs . . . . . . . .
5.5.9 Allocation . . . . . . . . . . . . . . . . . . . .
5.5.10 VHDL Implementation #2 . . . . . . . . . . .
5.5.11 Separate Datapath and Control . . . . . . . .
5.5.12 Dont-Care Instantiations . . . . . . . . . .
5.5.13 VHDL Implementation #3 . . . . . . . . . . .
5.5.14 VHDL Implementation #4 . . . . . . . . . . .
5.5.15 Notes and Observations . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

413
414
415
416
417
418
419
423
428
431
432
432
434
435
437
440
443
446
453
453
457
458
463
463
464
465
466
467
469
471
472
473
475
478
479
481
487
489
491

ix

CONTENTS

Advanced RTL Design: Optimization


6.1 Pipelining . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Introduction to Pipelining . . . . . . . . . . . .
6.1.2 Partially Pipelined . . . . . . . . . . . . . . . .
6.1.3 Terminology . . . . . . . . . . . . . . . . . . .
6.1.4 Design Example: Pipelined Massey . . . . . .
6.1.5 Overlapping Pipeline Stages . . . . . . . . . .
6.2 Staggering . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Retiming . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 General Optimizations . . . . . . . . . . . . . . . . .
6.4.1 Strength Reduction . . . . . . . . . . . . . . .
6.4.1.1 Arithmetic Strength Reduction . . . .
6.4.1.2 Boolean Strength Reduction . . . . .
6.4.2 Replication and Sharing . . . . . . . . . . . . .
6.4.2.1 Mux-Pushing . . . . . . . . . . . . .
6.4.2.2 Common Subexpression Elimination
6.4.2.3 Computation Replication . . . . . . .
6.4.3 Arithmetic . . . . . . . . . . . . . . . . . . . .
6.5 Customized State Encodings . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

493
494
495
500
502
503
507
515
516
521
521
521
522
523
523
524
526
527
528

Performance Analysis
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
7.2 Defining Performance . . . . . . . . . . . . . . . . .
7.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . .
7.4 Comparing Performance . . . . . . . . . . . . . . .
7.4.1 General Equations . . . . . . . . . . . . . . .
7.4.2 Example: Performance of Printers . . . . . .
7.5 Clock Speed, CPI, Program Length, and Performance
7.5.1 Mathematics . . . . . . . . . . . . . . . . . .
7.5.2 Example: CISC vs RISC and CPI . . . . . .
7.5.3 Effect of Instruction Set on Performance . . .
7.6 Effect of Time to Market on Relative Performance . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

529
530
531
534
537
537
545
546
546
547
551
556

.
.
.
.
.
.
.
.
.
.
.

CONTENTS

Timing Analysis
8.1 Delays and Definitions . . . . . . . . . . . . . . . . . .
8.1.1 Background Definitions . . . . . . . . . . . . . .
8.1.2 Clock-Related Timing Definitions . . . . . . . .
8.1.2.1 Clock Latency . . . . . . . . . . . . .
8.1.2.2 Clock Skew . . . . . . . . . . . . . . .
8.1.2.3 Clock Jitter . . . . . . . . . . . . . . .
8.1.3 Storage-Related Timing Definitions . . . . . . .
8.1.3.1 Flops and Latches . . . . . . . . . . .
8.1.4 Propagation Delays . . . . . . . . . . . . . . . .
8.1.5 Timing Constraints . . . . . . . . . . . . . . . .
8.1.6 Review: Timing Parameters . . . . . . . . . . . .
8.2 Timing Analysis of Simple Latches . . . . . . . . . . . .
8.2.1 Structure and Behaviour of Multiplexer Latch . .
8.2.2 Strategy for Timing Analysis of Storage Devices
8.2.3 Clock-to-Q Time of a Latch . . . . . . . . . . . .
8.2.4 From Load Mode to Store Mode . . . . . . . . .
8.2.5 Setup Time Analysis . . . . . . . . . . . . . . .
8.2.6 Hold Time of a Multiplexer Latch . . . . . . . .
8.2.7 Example of a Bad Latch . . . . . . . . . . . . .
8.2.8 Summary . . . . . . . . . . . . . . . . . . . . .
8.3 Advanced Timing Analysis of Storage Elements . . . . .
8.4 Critical Path . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 Introduction to Critical and False Paths . . . . . .
8.4.1.1 Example of Critical Path in Full Adder
8.4.1.2 Longest Path and Critical Path . . . . .
8.4.1.3 Criteria for Critical Path Algorithms . .
8.4.2 Longest Path . . . . . . . . . . . . . . . . . . .
8.4.2.1 Algorithm to Find Longest Path . . . .
8.4.2.2 Longest Path Example . . . . . . . . .
8.4.3 Monotone Speedup . . . . . . . . . . . . . . . .
8.5 False Paths . . . . . . . . . . . . . . . . . . . . . . . . .
8.6 Analog Timing Model . . . . . . . . . . . . . . . . . . .
8.6.1 Defining Delay . . . . . . . . . . . . . . . . . .
8.6.2 Modeling Circuits for Timing . . . . . . . . . . .
8.6.3 Example: Two Buffers . . . . . . . . . . . . . .
8.6.4 Ex: Two Bufs with Both Caps . . . . . . . . . .
8.7 Elmore Delay Model . . . . . . . . . . . . . . . . . . .
8.7.1 Elmore Delay as an Approximation . . . . . . .
8.7.2 A More Complicated Example . . . . . . . . . .
8.8 Practical Usage of Timing Analysis . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

563
564
564
565
565
567
569
571
571
574
575
577
578
578
581
582
583
584
590
593
596
597
598
600
601
603
606
607
607
608
609
613
614
615
619
623
628
632
632
635
639

xi

CONTENTS

Power Analysis and Power-Aware Design


9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.1 Importance of Power and Energy . . . . . . . . . . .
9.1.2 Power vs Energy . . . . . . . . . . . . . . . . . . .
9.1.3 Batteries, Power and Energy . . . . . . . . . . . . .
9.1.3.1 Do Batteries Store Energy or Power? . . .
9.1.3.2 Battery Life and Efficiency . . . . . . . . .
9.1.3.3 Battery Life and Power . . . . . . . . . . .
9.2 Power Equations . . . . . . . . . . . . . . . . . . . . . . .
9.2.1 Switching Power . . . . . . . . . . . . . . . . . . .
9.2.2 Short-Circuited Power . . . . . . . . . . . . . . . .
9.2.3 Leakage Power . . . . . . . . . . . . . . . . . . . .
9.2.4 Glossary . . . . . . . . . . . . . . . . . . . . . . . .
9.2.5 Note on Power Equations . . . . . . . . . . . . . . .
9.3 Overview of Power Reduction Techniques . . . . . . . . . .
9.4 Voltage Reduction for Power Reduction . . . . . . . . . . .
9.5 Data Encoding for Power Reduction . . . . . . . . . . . . .
9.5.1 How Data Encoding Can Reduce Power . . . . . . .
9.5.2 Example Problem: Sixteen Pulser . . . . . . . . . .
9.5.2.1 Problem Statement . . . . . . . . . . . . .
9.5.2.2 Additional Information . . . . . . . . . . .
9.5.2.3 Answer . . . . . . . . . . . . . . . . . . .
9.6 Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . .
9.6.1 Introduction to Clock Gating . . . . . . . . . . . . .
9.6.2 Implementing Clock Gating . . . . . . . . . . . . .
9.6.3 Design Process . . . . . . . . . . . . . . . . . . . .
9.6.4 Effectiveness of Clock Gating . . . . . . . . . . . .
9.6.5 Example: Reduced Activity Factor with Clock Gating
9.6.6 Calculating PctBusy . . . . . . . . . . . . . . . . . .
9.6.6.1 Valid Bits and Busy . . . . . . . . . . . . .
9.6.6.2 Calculating LenBusy . . . . . . . . . . . .
9.6.6.3 From LenBusy to PctBusy . . . . . . . . .
9.6.7 Example: Pipelined Circuit with Clock-Gating . . . .
9.6.8 Clock Gating in ASICs . . . . . . . . . . . . . . . .
9.6.9 Alternatives to Clock Gating . . . . . . . . . . . . .
9.6.9.1 Use Chip Enables . . . . . . . . . . . . . .
9.6.9.2 Operand Gating . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

641
642
642
643
644
644
645
646
649
651
653
654
655
655
655
660
666
666
670
670
671
673
680
681
682
683
684
688
690
690
692
694
696
702
703
703
704

CONTENTS

10 Review
10.1 Overview of the Term . . . . . . . . . .
10.2 VHDL . . . . . . . . . . . . . . . . . .
10.2.1 VHDL Topics . . . . . . . . . .
10.2.2 VHDL Example Problems . . . .
10.3 RTL Design Techniques . . . . . . . . .
10.3.1 Design Topics . . . . . . . . . .
10.3.2 Design Example Problems . . . .
10.4 Performance Analysis and Optimization
10.4.1 Performance Topics . . . . . . .
10.4.2 Performance Example Problems .
10.5 Timing Analysis . . . . . . . . . . . . .
10.5.1 Timing Topics . . . . . . . . . .
10.5.2 Timing Example Problems . . .
10.6 Power . . . . . . . . . . . . . . . . . .
10.6.1 Power Topics . . . . . . . . . . .
10.6.2 Power Example Problems . . . .
10.7 Formulas to be Given on Final Exam . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

xii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

705
706
707
707
708
709
709
710
711
711
712
713
713
714
715
715
716
717

Chapter 1
Fundamentals of VHDL
1.1
1.1.1

Introduction to VHDL
Levels of Abstraction

There are many different levels of abstraction for working with hardware:
Quantum: Schrodingers equations describe movement of electrons and holes through material.
Energy band: 2-dimensional diagrams that capture essential features of Schrodingers equations. Energy-band diagrams are commonly used in nano-scale engineering.
Transistor: Signal values and time are continous (analog). Each transistor is modeled by a
resistor-capacitor network. Overall behaviour is defined by differential equations in terms of
the resistors and capacitors. Spice is a typical simulation tool.
Switch: Time is continuous, but voltage may be either continuous or discrete. Linear equations are used, rather than differential equations. A rising edge may be modeled as a linear
rise over some range of time, or the time between a definite low value and a definite high
value may be modeled as having an undefined or rising value.
Gate: Transistors are grouped together into gates (e.g. AND, OR, NOT). Voltages are discrete
values such as pure Boolean (0 or 1) or IEEE Standard Logic 1164, which has representations
for different types of unknown or undefined values. Time may be continuous or may be
discrete. If discrete, a common unit is the delay through a single inverter (e.g. a NOT gate
has a delay of 1 and AND gate has a delay of 2).

13

1.1.2

VHDL Origins and History

14

Register transfer level: The essential characteristic of the register transfer level is that the
behaviour of hardware is modeled as assignments to registers and combinational signals.
Equations are written where a register signal is a function of other signals (e.g. c = a
and b;). The assignments may be either combinational or registered. Combinational assignments happen instanteously and registered assignments take exactly one clock cycle.
There are variations on the pure register-transfer level. For example, time may be measured
in clock phases rather than clock cycles, so as to allow assignments on either the rising or
falling edge of a clock. Another variation is to have multiple clocks that run at different
speeds a clock on a bus might run at half the speed of the primary clock for the chip.
Transaction level: The basic unit of computation is a transaction, such as executing an instruction on a microprocessor, transfering data across a bus, or accessing memory. Time
is usually measured as an estimate (e.g. a memory write requires 15 clock cycles, or a
bus transfer requires 250 ns.). The building blocks of the transaction level are processors,
controllers, memory arrays, busses, intellectual property (IP) blocks (e.g. UARTs). The
behaviour of the building blocks are described with software-like models, often written in
behavioural VHDL, SystemC, or SystemVerilog. The transaction level has many similarities
to a software model of a distributed system.
Electronic-system level: Looks at an entire electronic system, with both hardware and software.
In this course, we will focus on the register-transfer level. In the second half of the course, we will
look at how analog phenomenon, such as timing and power, affect the register-transfer level. In
these chapters we will occasionally dip down into the transistor, switch, and gate levels.

1.1.2

VHDL Origins and History

VHDL = VHSIC Hardware Description Language


VHSIC = Very High Speed Integrated Circuit

The VHSIC Hardware Description Language (VHDL) is a formal notation intended


for use in all phases of the creation of electronic systems. Because it is both machine
readable and human readable, it supports the development, verification, synthesis and
testing of hardware designs, the communication of hardware design data, and the
maintenance, modification, and procurement of hardware.
Language Reference Manual (IEEE Design Automation Standards Committee,
1993a)

15

CHAPTER 1. FUNDAMENTALS OF VHDL

development
verification
synthesis
testing
hardware designs

communication
maintenance
modification
procurement

VHDL is a lot more than synthesis of digital


hardware
VHDL History

........................................................................

Developed by the United States Department of Defense as part of the very high speed integrated
circuit (VHSIC) program in the early 1980s.
The Department of Defense intended VHDL to be used for the documentation, simulation and
verification of electronic systems.
Goals:
improve design process over schematic entry
standardize design descriptions amongst multiple vendors
portable and extensible
Inspired by the ADA programming language
large: 97 keywords, 94 syntactic rules
verbose (designed by committee)
static type checking, overloading
complicated syntax: parentheses are used for both expression grouping and array indexing
Example:
a <= b * (3 + c); -- integer
a <= (3 + c);
-- 1-element array of integers
Standardized by IEEE in 1987 (IEEE 1076-1987), revised in 1993, 2000.
In 1993 the IEEE standard VHDL package for model interoperability, STD_LOGIC_1164
(IEEE Standard 1164-1993), was developed.
std_logic_1164 defines 9 different values for signals
In 1997 the IEEE standard packages for arithmetic over std logic and bit signals were
defined (IEEE Standard 1076.31997).
numeric_std defines arithmetic over std logic vectors and integers.
Note:
This is the package that you should use for arithmetic. Dont
use std logic arith it has less uniform support for mixed integer/signal arithmetic and has a greater tendency for differences between
tools.
numeric_bit defines arithmetic over bit vectors and integers. We wont use bit
signals in this course, so you dont need to worry about this package.

1.1.3

1.1.3

Semantics

16

Semantics

The original goal of VHDL was to simulate circuits. The semantics of the language define circuit
behaviour.
a
c <= a AND b;

simulation

b
c

But now, VHDL is used in simulation and synthesis. Synthesis is concerned with the structure of
the circuit.
Synthesis: converts one type of description (behavioural) into another, lower level, description
(usually a netlist).
c <= a AND b;

synthesis

a
c
b

Synthesis is a computer-aided design (CAD) technique that transforms a designers concise, highlevel description of a circuit into a structural description of a circuit.

CAD Tools

............................................................................

CAD Tools allow designers to automate lower-level design processes in implementing the desired
functionality of a system.
NOTE: EDA = Electronic Design Automation. In digital hardware design EDA = CAD.

Synthesis vs Simulation

................................................................

For synthesis, we want the code we write to define the structure of the hardware that is generated.
The VHDL semantics define the behaviour of the hardware that is generated, not the structure
of the hardware. The scenario below complies with the semantics of VHDL, because the two
synthesized circuits produce the same behaviour. If the two synthesized circuits had different
behaviour, then the scenario would not comply with the VHDL Standard.

17

CHAPTER 1. FUNDAMENTALS OF VHDL

a
b

ula
tio

sim

same
behaviour
a

c <= a AND b;

synthesis

c
b

simulation

b
c

the
syn
sis

different
structure

same
behaviour
a

a
c
b

simulation

b
c

1.1.4

Synthesis of a Simulation-Based Language

Not all of VHDL is synthesizable


c <= a AND b; (synthesizable)
c <= a AND b AFTER 2ns; (NOT synthesizable)
how do you build a circuit with exactly 2ns of delay through an AND gate?
more examples of non-synthesizable code are in section 1.12
See section 1.12 for more details
Different synthesis tools support different subsets of VHDL
Some tools generate erroneous hardware for some code
behaviour of hardware differs from VHDL semantics
Some tools generate unpredictable hardware (Hardware that has the correct behaviour, but undesirable or weird structure).
There is an IEEE standard (1076.6) for a synthesizable subset of VHDL, but tool vendors do
not yet conform to it. (Most vendors still do not have full support for the 1993 extensions to
VHDL!). For more info, see http://www.vhdl.org/siwg/.

1.1.5

1.1.5

Solution to Synthesis Sanity

18

Solution to Synthesis Sanity

Pick a high-quality synthesis tool and study its documentation thoroughly


Learn the idioms of the tool
Different VHDL code with same behaviour can result in very different circuits
Be careful if you have to port VHDL code from one tool to another
KISS: Keep It Simple Stupid
VHDL examples will illustrate reliable coding techniques for the synthesis tools from Synopsys, Mentor Graphics, Altera, Xilinx, and most other companies as well.
Follow the coding guidelines and examples from lecture
As you write VHDL, think about the hardware you expect to get.
Note:
If you cant predict the hardware, then the hardware probably
wont be very good (small, fast, correct, etc)

1.1.6

Standard Logic 1164

At the core of VHDL is a package named STANDARD that defines a type named bit with values
of 0 and 1. For simulation, it helpful to have additional values, such as undefined and
high impedance. Many companies created their own (incompatible) definitions of signal types
for simulation. To regain compatibility amongst packages from different companies, the IEEE
defined std logc 1164 to be the standard type for signal values in VHDL simulation.
U
X
0
1
Z
W
L
H
--

uninitialized
strong unknown
strong 0
strong 1
high impedance
weak unknown
weak 0
weak 1
dont care

The most common values are: U, X, 0, 1.


If you see X in a simulation, it usually means that there is a mistake in your code.
Every VHDL file that you write should begin with: library ieee;
use ieee.std_logic_1164.all;

Note: std logic vs boolean


The std logic values 1 and 0 are not
the same as the boolean values true and false. For example, you must
write if a = 1 then .... The code if a then ... will not typecheck if a is of type std logic.

19

CHAPTER 1. FUNDAMENTALS OF VHDL

From a VLSI perspective, a weak value will come from a smaller gate. One aspect of VHDL that
we dont touch on in ece327 is resolution, which describes how to determine the value of a signal
if the signal is driven by bmore than one/b process. (In ece327, we restrict ourselves to having
each signal be driven by (be the target of) exactly one process). The std logic 1164 library provides
a resolution function to deal with situation where different processes drive the same signal with
different values. In this situation, a strong value (e.g. 1) will overpower a weak value (e.g. L).
If two processes drive the signal with different strong values (e.g. 1 and 0) the signal resolves
to a strong unknown (X). If a signal is driven with two different weak values (e.g. H and L),
the signal resolves to a weak unknown (W).

1.2

1.2.1

Comparison of VHDL to Other Hardware Description Languages


VHDL Disadvantages

Some VHDL programs cannot be synthesized


Different tools support different subsets of VHDL.
Different tools generate different circuits for same code
VHDL is verbose
Many characters to say something simple
VHDL is complicated and confusing
Many different ways of saying the same thing
Constructs that have similar purpose have very different syntax (case vs. select)
Constructs that have similar syntax have very different semantics (variables vs signals)
Hardware that is synthesized is not always obvious (when is a signal a flip-flop vs latch vs
combinational)
The infamous latch inference problem (See section 1.5.2 for more information)

1.2.2

VHDL Advantages

VHDL supports unsynthesizable constructs that are useful in writing high-level models, testbenches and other non-hardware or non-synthesizable artifacts that we need in hardware design.
VHDL can be used throughout a large portion of the design process in different capacities, from
specification to implementation to verification.
VHDL has static typechecking many errors can be caught before synthesis and/or simulation.
(In this respect, it is more similar to Java than to C.)
VHDL has a rich collection of datatypes
VHDL is a full-featured language with a good module system (libraries and packages).
VHDL has a well-defined standard.

1.2.3

VHDL and Other Languages

1.2.3

VHDL and Other Languages

1.2.3.1

VHDL vs Verilog

20

Verilog is a simpler language: smaller language, simple circuits are easier to write
VHDL has more features than Verilog
richer set of data types and strong type checking
VHDL offers more flexibility and expressivity for constructing large systems.
The VHDL Standard is more standard than the Verilog Standard
VHDL and Verilog have simulation-based semantics
Simulation vendors generally conform to VHDL standard
Some Verilog constructs give different behaviours in simulation and synthesis
VHDL is used more than Verilog in Europe and Japan
Verilog is used more than VHDL in North America
VHDL is used more in FPGAs than in ASICs
South-East Asia, India, South America: ?????

1.2.3.2

VHDL vs System Verilog

System Verilog is a superset of Verilog. It extends Verilog to make it a full object-oriented


hardware modelling language
Syntax is based on Verilog and C++.
As of 2007, System Verilog is used almost exclusively for test benches and simulation. Very
few people are trying to use it to do hardware design.
System Verilog grew out of Superlog, a proposed language that was based on Verilog and C.
Basic core came from Verilog. C-like extensions included to make language more expressive and
powerful. Developed by originally the company Co-Design Automation and then standardized
by Accellera, an organization aimed at standardizing EDA languages. Co-Design was purchased
by Synopsys and now Synopsys is the leading proponent of System Verilog.

1.2.3.3

VHDL vs SystemC

System C looks like C familiar syntax


C is often used in algorithmic descriptions of circuits, so why not try to use it for synthesizable
code as well?
If you think VHDL is hard to synthesize, try C....
SystemC simulation is slower than advertised

21

1.2.3.4

CHAPTER 1. FUNDAMENTALS OF VHDL

Summary of VHDL Evaluation

VHDL is far from perfect and has lots of annoying characteristics


VHDL is a better language for education than Verilog because the static typechecking enforces
good software engineering practices
The richness of VHDL will be useful in creating concise high-level models and powerful testbenches

1.3

Overview of Syntax

This section is just a brief overview of the syntax of VHDL, focusing on the constructs that are
most commonly used. For more information, read a book on VHDL and use online resources.
(Look for VHDL under the Documentation tab in the E&C 327 web pages.)

1.3.1

Syntactic Categories

There are five major categories of syntactic constructs.


(There are many, many minor categories and subcategories of constructs.)
Library units (section 1.3.2)
Top-level constructs (packages, entities, architectures)
Concurrent statements (section 1.3.4)
Statements executed at the same time (in parallel)
Sequential statements (section 1.3.7)
Statements executed in series (one after the other)
Expressions
Arithmetic (section 2.3), Boolean, Vectors , etc
Declarations
Components , signals, variables, types, functions, ....

1.3.2

1.3.2

Library Units

22

Library Units

Library units are the top-level syntactic constructs in VHDL. They are used to define and include
libraries, declare and implement interfaces, define packages of declarations and otherwise bind
together VHDL code.
Package body
define the contents of a library
Packages
determine which parts of the library are externally visible
Use clause
use a library in an entity/architecture or another package
technically, use clauses are part of entities and packages, but they proceed the entity/package
keyword, so we list them as top-level constructs
Entity (section 1.3.3)
define interface to circuit
Architecture (section 1.3.3)
define internal signals and gates of circuit

1.3.3

Entities and Architecture

Each hardware module is described with an Entity/Architecture pair

entity
architecture

entity

architecture

Figure 1.1: Entity and Architecture


Entity: interface
names, modes (in / out), types of
externally visible signals of circuit

Architecture: internals
structure and behaviour of module

23

CHAPTER 1. FUNDAMENTALS OF VHDL

library ieee;
use ieee.std_logic_1164.all;
entity and_or is
port (
a, b, c : in std_logic ;
z
: out std_logic
);
end entity;

Figure 1.2: Example of an entity

1.3.3

Entities and Architecture

The syntax of VHDL is defined using a variation on Backus-Naur forms (BNF).

[ { use_clause } ]
entity ENTITYID is
[ port (
{ SIGNALID : (in | out) TYPEID [ := expr ] ; }
);
]
[ { declaration } ]
[ begin
{ concurrent_statement } ]
end [ entity ] ENTITYID ;

Figure 1.3: Simplified grammar of entity

architecture main of and_or is


signal x : std_logic;
begin
x <= a AND b;
z <= x OR (a AND c);
end architecture;

Figure 1.4: Example of architecture

[ { use_clause } ]
architecture ARCHID of ENTITYID is
[ { declaration } ]
begin
[ { concurrent_statement } ]
end [ architecture ] ARCHID ;

Figure 1.5: Simplified grammar of architecture

24

25

CHAPTER 1. FUNDAMENTALS OF VHDL

1.3.4

Concurrent Statements

An architecture contains concurrent statements


Concurrent statements execute in parallel
Concurrent statements make VHDL fundamentally different from most software languages.
Hardware (gates) naturally execute in parallel VHDL mimics the behaviour of real hardware.
At each infinitesimally small moment of time, each gate:
1. samples its inputs
2. computes the value of its output
3. drives the output

architecture main of bowser is


begin
x1 <= a AND b;
x2 <= NOT x1;
z <= NOT x2;
end main;

architecture main of bowser is


begin
z <= NOT x2;
x2 <= NOT x1;
x1 <= a AND b;
end main;

x1

x2

Figure 1.6: The order of concurrent statements doesnt matter

1.3.4

conditional assignment

Concurrent Statements

. . . <= . . . when . . . else . . .;


normal assignment (. . . <= . . .)
if-then-else style (uses when)

c <= a+b when sel=1 else a+c when sel=0 else "0000";
selected assignment
with . . . select
. . . <= . . . when . . . | . . . ,
. . . when . . . | . . . ,
...
. . . when . . . | . . . ;

case/switch style assignment


with color select d <= "00" when red , "01" when . . .;
component instantiation
. . . : . . . port map ( . . . => . . . , . . . );
use an existing circuit
section 1.3.5
add1 :
for-generate

adder port map( a => f, b => g, s => h, co => i);


. . . : for . . . in . . . generate
...
end generate;

replicate some hardware


bgen:
if-generate

for i in 1 to 7 generate b(i)<=a(7-i); end generate;


. . . : if . . . generate
...
end generate;

conditionally create some hardware


okgen : if optgoal /= fast then generate
result <= ((a and b) or (d and not e)) or g;
end generate;
fastgen : if optgoal = fast then generate
result <= 1;
end generate;
process

process . . . begin
...
end process;

the body of a process is executed sequentially


sections 1.3.6, 1.6

26

27

1.3.5

CHAPTER 1. FUNDAMENTALS OF VHDL

Component Declaration and Instantiations

There are two different syntaxes for component declaration and instantiation. The VHDL-93 syntax is much more concise than the VHDL-87 syntax.
Not all tools support the VHDL-93 syntax. For E&CE 327, some of the tools that we use do not
support the VHDL-93 syntax, so we are stuck with the VHDL-87 syntax.

1.3.6

Processes

Processes are used to describe complex and potentially unsynthesizable behaviour


A process is a concurrent statement (section 1.3.4).
The body of a process contains sequential statements (section 1.3.7)
Processes are the most complex and difficult to understand part of VHDL (sections 1.5 and 1.6)

process (a, b, c)
begin
y <= a AND b;
if (a = 1) then
z1 <= b AND c;
z2 <= NOT c;
else
z1 <= b OR c;
z2 <= c;
end if;
end process;

process
begin
y <= a AND b;
z <= 0;
wait until rising_edge(clk);
if (a = 1) then
z <= 1;
y <= 0;
wait until rising_edge(clk);
else
y <= a OR b;
end if;
end process;

Figure 1.8: Examples of processes

Processes must have either a sensitivity list or at least one wait statement on each execution path
through the process.
Processes cannot have both a sensitivity list and a wait statement.

1.3.6

Sensitivity List

Processes

28

....................................................................... .

The sensitivity list contains the signals that are read in the process.
A process is executed when a signal in its sensitivity list changes value.
An important coding guideline to ensure consistent synthesis and simulation results is to include
all signals that are read in the sensitivity list. If you forget some signals, you will either end up
with unpredictable hardware and simulation results (different results from different programs) or
undesirable hardware (latches where you expected purely combinational hardware). For more on
this topic, see sections 1.5.2 and 1.6.
There is one exception to this rule: for a process that implements a flip-flop with an if rising edge
statement, it is acceptable to include only the clock signal in the sensitivity list other signals
may be included, but are not needed.

[ PROCLAB : ] process ( sensitivity_list )


[ { declaration } ]
begin
{ sequential_statement }
end process [ PROCLAB ] ;

Figure 1.9: Simplified grammar of process

29

CHAPTER 1. FUNDAMENTALS OF VHDL

1.3.7

Sequential Statements

Used inside processes and functions.

wait
wait until . . . ;
signal assignment . . . <= . . . ;
if-then-else
if . . . then . . . elsif . . . end if;
case
case . . . is
when . . . | . . . => . . . ;
when . . . => . . . ;
end case;
loop
loop . . . end loop;
while loop
while . . . loop . . . end loop;
for loop
for . . . in . . . loop . . . end loop;
next
next . . . ;
Figure 1.10: The most commonly used sequential statements

1.3.8

1.3.8

A Few More Miscellaneous VHDL Features

30

A Few More Miscellaneous VHDL Features

Some constructs that are useful and will be described in later chapters and sections:
report : print a message on stderr while simulating
assert : assertions about behaviour of signals, very useful with report statements.
generics : parameters to an entity that are defined at elaboration time.
attributes : predefined functions for different datatypes. For example: high and low indices of a
vector.

1.4

Concurrent vs Sequential Statements

All concurrent assignments can be translated into sequential statements. But, not all sequential
statements can be translated into concurrent statements.

1.4.1

Concurrent Assignment vs Process

The two code fragments below have identical behaviour:


architecture main of tiny is
begin
b <= a;
end main;

1.4.2

architecture main of tiny is


begin
process (a) begin
b <= a;
end process;
end main;

Conditional Assignment vs If Statements

The two code fragments below have identical behaviour:


Concurrent Statements
t <=
<val1> when <cond>
else <val2>;

Sequential Statements
if <cond> then
t <= <val1>;
else
t <= <val2>;
end if

31

1.4.3

CHAPTER 1. FUNDAMENTALS OF VHDL

Selected Assignment vs Case Statement

The two code fragments below have identical behaviour


Sequential Statements

Concurrent Statements

case <expr> is
when <choices1> =>
t <= <val1>;
when <choices2> =>
t <= <val2>;
when <choices3> =>
t <= <val3>;
end case;

with <expr> select


t <= <val1> when <choices1>,
<val2> when <choices2>,
<val3> when <choices3>;

1.4.4

Coding Style

Code thats easy to write with sequential statements, but difficult with concurrent:
Concurrent Statements

Sequential Statements
case <expr> is
when <choice1> =>
if <cond> then
o <= <expr1>;
else
o <= <expr2>;
end if;
when <choice2> =>
...
end case;

Overall structure:
with <expr> select
t <= ... when <choice1>,
... when <choice2>;

Failed attempt:
with <expr> select
t <= -- want to write:
-<val1> when <cond>
-else <val2>
-- but conditional assignment
-- is illegal here
when c1,
...
when c2;

Concurrent statement with correct behaviour, but messy:


t <=
<expr1> when (expr = <choice1> AND
<cond>)
else <expr2> when (expr = <choice1> AND NOT <cond>)
else . . .
;

1.5. OVERVIEW OF PROCESSES

1.5

32

Overview of Processes

Processes are the most difficult VHDL construct to understand. This section gives an overview of
processes. section 1.6 gives the details of the semantics of processes.
Within a process, statements are executed almost sequentially
Among processes, execution is done in parallel
Remember: a process is a concurrent statement!

entity ENTITYID is
interface declarations
end ENTITYID;
architecture ARCHID of ENTITYID is
begin
concurrent statements =
process begin
sequential statements =
end process;
concurrent statements
=
end ARCHID;

Figure 1.11: Sequential statements in a process


Key concepts in VHDL semantics for processes:
VHDL mimics hardware
Hardware (gates) execute in parallel
Processes execute in parallel with each other
All possible orders of executing processes must produce the same simulation results (waveforms)
If a signal is not assigned a value, then it holds its previous value

All orders of executing concurrent statements must


produce the same waveforms
It doesnt matter whether you are running on a single-threaded operating system, on a multithreaded operating system, on a massively parallel supercomputer, or on a special hardware emulator with one FPGA chip per VHDL process all simulations must be the same.
These concepts are the motivation for the semantics of executing processes in VHDL (section 1.6)
and lead to the phenomenon of latch-inference (section 1.5.2).

33

CHAPTER 1. FUNDAMENTALS OF VHDL

execution sequence

execution sequence

execution sequence

architecture
procA: process
stmtA1;

A1

stmtA2;

A1
A2

stmtA3;

A1
A2

A3

A2
A3

A3

end process;
procB: process
stmtB1;
stmtB2;
end process;

B1

B1
B2

B1
B2

B2

single threaded:
single threaded:
multithreaded: procA
procA before procB procB before procA and procB in parallel
Figure 1.12: Different process execution sequences

Figure 1.13: All execution orders must have same behaviour


sections 1.5.11.5.3 discuss the hardware generated by processes.
sections 1.6?? discuss the behaviour and execution of processes.

1.5.1

1.5.1

Combinational Process vs Clocked Process

34

Combinational Process vs Clocked Process

Each well-written synthesizable process is either combinational or clocked. Some synthesizable


processes that do not conform to our coding guidelines are both combinational and clocked. For
example, in a flip-flop with an asynchronous reset, the output is a combinational function of the
reset signal and a clocked function of the data input signal. We will deal with only with processes
that follow our coding conventions, and so we will continue to say that each process is either
combinational xor clocked.

Combinational process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Executing the process takes part of one clock cycle
Target signals are outputs of combinational circuitry
A combinational processes must have a sensitivity list
A combinational process must not have any wait statements
A combinational process must not have any rising_edges, or falling_edges
The hardware for a combinational process is just combinational circuitry

Clocked process:

..................................................................... .

Executing the process takes one (or more) clock cycles


Target signals are outputs of flops
Process contains one or more wait or if rising edge statements
Hardware contains combinational circuitry and flip flops

Note:
Clocked processes are sometimes called sequential processes,
but this can be easily confused with sequential statements, so in E&CE 327
well refer to synthesizable processes as either combinational or clocked.

35

CHAPTER 1. FUNDAMENTALS OF VHDL

Example Processes

................................................................... .

Combinational Process
process (a,b,c)
p1 <= a;
if (b = c) then
p2 <= b;
else
p2 <= a;
end if;
end process;

Clocked Processes
process
begin
wait until rising_edge(clk);
b <= a;
end process;
process (clk)
begin
if rising_edge(clk) then
b <= a;
end if;
end process;

1.5.2

Latch Inference

The semantics of VHDL require that if a signal is assigned a value on some passes through a
process and not on other passes, then on a pass through the process when the signal is not assigned
a value, it must maintain its value from the previous pass.

process (a, b, c)
begin
if (a = 1) then
z1 <= b;
z2 <= b;
else
z1 <= c;
end if;
end process;

a
b
c
z1
z2

Figure 1.14: Example of latch inference

1.5.2

Latch Inference

36

When a signals value must be stored, VHDL infers a latch or a flip-flop in the hardware to store
the value.
If you want a latch or a flip-flop for the signal, then latch inference is good.
If you want combinational circuitry, then latch inference is bad.

Loop, Latch, Flop

.................................................................... .

a
b

z
a

EN

Latch
Combinational loop

Question:

Write VHDL code for each of the above circuits

Answer:

combinational loop
if a = 1 then
z <= b;
else
z <= z;
end if;

latch
if a = 1 then
z <= b;
end if;

flop
if rising edge(a) then
z <= b;
end if;

Flip-flop

37

CHAPTER 1. FUNDAMENTALS OF VHDL

Causes of Latch Inference

............................................................ .

Usually, latch inference refers to the unintentional creation of latches.


The most common cause of unintended latch inference is missing assignments to signals in if-thenelse and case statements.
Latch inference happens during elaboration. When using the Synopsys tools, look for:
Inferred memory devices
in the output or log files.

1.5.3

Combinational vs Flopped Signals

Signals assigned to in combinational processes are combinational.


Signals assigned to in clocked processes are outputs of flip-flops.

1.6

VHDL Execution: Delta-Cycle Simulation

In this section we go through the detailed semantics of how processes execute. These semantics
form the foundation for the simulation and synthesis of VHDL. The semantics define the simulation
behaviour, and the duty of synthesis is to produce hardware that has the same behaviour as the
simulation of the original VHDL code.

1.6.1

Simple Simulation

Throughout the discussion of simulation, we must keep in mind the fundamental observation about
the behaviour of hardware:
Hardware runs in parallel: At each infinitesimally small moment of time, each gate:
1. samples its inputs
2. computes the value of its output
3. drives the output
Before diving into the details of processes, we briefly review gate-level simulation with a simple
example, which we will then explore in excruciating detail through the semantics of VHDL.
With knowledge of just basic gate-level behaviour, we simulate the circuit below with waveforms
for a and b and calculate the behaviour for c, d, and e.

1.6.2

Temporal Granularities of Simulation

0ns

10ns 12ns

38

15ns

a
a

d
e

d
e

Different Programs, Same Behaviour

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

There are many different VHDL programs that will synthesize to this circuit. Three examples are:
process (a,b)
begin
c <= a and b;
end process;
process (b,c,d)
begin
d <= not c;
e <= b and d;
end process;

process (a,b,c,d)
begin
c <= a and b;
d <= not c;
e <= b and d;
end process;

process (a,b)
begin
c <= a and b;
end process;
process (c)
begin
d <= not c;
end process;
process (b,d)
begin
e <= b and d;
end process;

The goal of the VHDL semantics is that all of these programs will have the same behaviour.
The two main challenges to make this happen are: a value change on a signal must propagate
instantaneously, and all gates must operate in parallel. We will return to these points in section 1.6.4

1.6.2

Temporal Granularities of Simulation

There are several different granularities of time to analyze VHDL behaviour. In this course, we
will discuss three major granularities: clock cycles, timing simulation, and delta cycles.
register-transfer-level
smallest unit of time is a clock cycle
combinational logic has zero delay
flip-flops have a delay of one clock cycle
used for simulation early in the design cycle
fastest simulation run times
timing simulation

39

CHAPTER 1. FUNDAMENTALS OF VHDL

smallest unit of time is a nano, pico, or fempto second


combinational logic and wires have delay as computed by timing analysis tools
flip-flops have setup, hold, and clock-to-Q timing parameters
used for simulation when fine-tuning design and confirming that timing contraints are
satisfied
slow simulation times for large circuits
delta cycles
units of time are artifacts of VHDL semantics and simulation software
simulation cycles, delta cycles, and simulation steps are infinitesimally small amounts of
time
VHDL semantics are defined in terms of these concepts
For the remainder of section 1.6, we will look at only the delta-cycle view of the world.

1.6.3

Zero-Delay Simulation

Register-transfer-level and delta-cycle simulation are both examples of zero-delay simulation.


In zero-delay simulation, a sequence of dependent events must appear to happen instantaneously
(in zero time). In particular, the effect of an event must propagate instantaneously through combinational circuitry.
Zero-delay simulation might appear to be simpler than simulation with delays through gates (timing simulation), but in reality, zero-delay simulation algorithms are more complicated than algorithms for timing simulation. The reason is that in zero-delay simulation, a sequence of dependent
events must appear to happen instantaneously (in zero time).
There are two fundamental rules for zero-delay simulation:
1. Events appear to propagate through combinational circuitry instantaneously.
2. All of the gates appear to operate in parallel
The rules for zero-delay simulation say appear to operate in parallel, rather than operate in
parallel, because software executes sequentially, or in serial, as opposed to in parallel (the next
paragraph discusses concurrent software languages). A simulator cannot simulate multiple gates
in parallel. Instead, the simulator must simulate the gates one at a time, but make the waveforms
appear as if all of the gates were simulated in parallel.
The characterization of software as purely sequential is, of course, just our simple-minded hardwareoriented perspective on software. In reality, many software languages support multithreading and
other forms of parallel execution. However, even moderately sized circuits have more gates than
would make for a reasonable number of concurrent processes on even a massively parallel supercomputer. So, all reasonalble semantics for simulation must provide some mechanism for sequential execution to appear to be parallel execution.
There are many different ways to implement zero-delay simulation. We will study two examples:
VHDLs delta-cycle simulation, and a register-transfer-level simulation algorithm.

1.6.4

Intuition Behind Delta-Cycle Simulation

1.6.4

Intuition Behind Delta-Cycle Simulation

1.6.4.1

Introduction to Delta-Cycle Simulation

40

To make it appear that events propagate instantaneously through combinational circuitry:


VHDL introduces the delta cycle
Infinitesimally small artificial unit of time
In each delta cycle, every gate in the circuit
1. samples its input signals
2. computes its result value
3. drives the result value on its output signal
To make it appear that gates operate in parallel: VHDL introduces the projected assignment
the effect of simulating a gate remains invisible until the beginning of the next delta cycle
In each delta cycle, the simulator will simulate each gate whose input changed, and thus the output
of the gate must be recomputed to reflect the new input value. The change on the output will cause
the next combinational gate to be simulated in the next delta cycle. Events appear to propagate
instantaneously through combinational logic, because all of the delta cycles needed for an event to
propagate through the combinational logic are collapsed into a single moment in time.
Recall that at each infinitesimally small moment of time, each gate samples its inputs, computes
the value of its output, and then drives the output. This sequence of sample, compute, and drive
occurs within a delta cycle and is done in parallel for all of the gates. Within a delta cycle, gates
operate independently. Signals propagate from one gate to another in a sequence of delta cycles.
The simulator must create the appearance that the gates operate in parallel, or independently, within
a delta cycle, even though the simulator will in fact simulate one gate at a time. One way to define
that the sequential execution of a set of processes preserves the appearance that the processes are
executedly independently is that the order in which the processes are executed does not matter. In
other words, the simulator gives the same results regardless of the order in which the processes
are executed. Because the simulator always gives the same results, regardless of the order of
processes, all orders of execution give the same results as the order in which all processes execute
simultaneously (i.e., truly concurrently).
To preserve the illusion that the gates ran in parallel within a delta cycle, with the projected assignment in VHDL, the effect of simulating a gate remains invisible until the beginning of the next
delta cycle. Thus, there are no dependencies between gates within a delta cycle.

41

1.6.4.2

CHAPTER 1. FUNDAMENTALS OF VHDL

Intuitive Rules for Delta-Cycle Simulation

1. Simulate a gate if any of its inputs changed. (If no input changed, then the current value of the
output is correct and the output can stay at the same value.)
2. Each gate is simulated at most once per delta cycle.
3. When a gate is executed, the projected (i.e., new) value of the output remains invisible until the
beginning of the next delta cycle.
4. Increment time when there is no need for another delta cycle (no gate had an input change value
in the current delta cycle).

1.6.4

1.6.4.3

Intuition Behind Delta-Cycle Simulation

42

Example of Delta-Cycles: Back-to-Back Buffers

Back-to-back buffers illustrate how VHDL simulation uses delta-cycles to achieve the illusion that
events propagate instantaneously through combinational circuitry. Without going into the details
of how delta-cycle simulation works, in this example, it takes three delta cycles for the rising edge
on a to propagate through the circuit. Because a delta-cycle is an infinitesimally small amount of
time, in real simulation time (the lower waveform), the rising edges on a, b, and c all appear to
happen at exactly 1 ns.
Delta-cycle simulation
1ns

-cycle

-cycle

-cycle

2ns

process (a) begin


b <= a;
end process;

b
c

process (b) begin


c <= b;
end process;

Simple simulation
1ns

c
a
b
c

2ns

43

1.6.4.4

CHAPTER 1. FUNDAMENTALS OF VHDL

Example of Projected Assignment: Back-to-Back Buffers

We now extend the the back-to-back buffers to include the projected assignments.
Two copies of each signal:
projected value (not visible)
current value (visible)

1ns
-cycle

-cycle

To update a signal with new value:


S sample current value of inputs
C compute new projected value
D drive new value (make it visible)
by copying from projected value
to current value
2ns
-cycle

-cycle

Delta-cycle simulation
with projected values

a
C
D

b
C
D

Simple
simulation

1ns

2ns

a
b
c

In VHDL, the current value of a signal is updated in the delta-cycle after the projected value is
computed. This requires one more delta cycle than in the previous, simplified, example of delta
cycle simulation with back-to-back buffers, which did not include projected assignment.

1.6.4.5

Example of Projected Assignment: Back-to-Back Flip-Flops

Back-to-back flip-flops illustrate how VHDL uses projected assignments to create the illusion that
gates operate in parallel. Both processes (p b and p c) are sensitive to the clock signal, so both
processes will run in the delta-cycle after the clock changes value. It is this delta cycle that we
focus on. When we execute p b, the process will see a=1 and will compute a new value of

1.6.4

Intuition Behind Delta-Cycle Simulation

44

1 for b. Because p b and p c must appear to execute in parallel within a delta cycle, the new
value of b=1 must not be visible to p c in this delta cycle.
When p b runs, the new value for b remains invisible until the beginning of the next delta cycle.
Hence, p c will see the old value of b, which is 0. The value of 1 will propagate from b to
c on the next rising edge of the clock, which is not shown.
9ns

10ns

11ns

clk

p_b: process (clk) begin


if rising_edge(clk) then
b <= a;
end if;
end process;
p_c: process (clk) begin
if rising_edge(clk) then
c <= b;
end if;
end process;

proc_b and proc_c appear


to execute in parallel
run both proc_b and proc_c in
the same delta cycle
proc_c must see old value of b

b
a

clk

9ns

10ns 11ns

a
clk
b
c

This example illustrates gates appearing to operate in parallel, because the two flip-flops appear
to execute at the same time, triggered by the rising edge of the clock. Using the definition of
appearing to execute independently, that the order in which we run the processes within a delta
cycle does not affect the values on the signals at the end of the delta cycle, we can see that regardless
of which order we run the processes, p c sees the old value of b.

45

1.6.4.6

CHAPTER 1. FUNDAMENTALS OF VHDL

Example of Projected Assignment with Combinational Loop

This circuit demonstrates how projected assignment simulates combinational loops correctly.

1ns

-cycle

-cycle

Final value

We begin with a truly parallel simulation of the circuit. This figure uses a thick arrow to denote
when a value is being computed. All of the three gates that are simulated (b, c, and d) sample
their inputs at the same time, then compute their new value in parallel, and drive their result at the
same time. Because the computation is done in parallel, this figure uses a different notation (the
thick arrows) to denote the computation is done.

a
b

simulation of gates b,c, and d


done truly in parallel, therefore
no need for projected assignment

Recall the following about the simulation done in a delta cycle:


The simulation of the gates must be done in parallel.
The execution of each gate must be independent of the other gates.
The choice of in which order to simulate the gates must not affect the result.
We now simulate the circuit using projected assignment with two different orders of gates and
then without projected assignment with two different orders. When we use projected assignment,
both orders give the same result, but without projected assignment, different orders give different
results.
The key is that, with projected assignment, values are visible at beginning of the next delta cycle,
while with the incorrect semantics, the assignments are visible as soon as they are executed. By
making the assignments invisible until the next delta cycle, the order in which the assignments are
done does not matter.

Intuition Behind Delta-Cycle Simulation

Combinational Loop: Correct Simulation

1ns

46

............................................. .

-cycle

-cycle

Final value

1.6.4

a
1

b
a

c
1

d
1

Execution order: b, c, d

1ns

-cycle

-cycle

Final value

a
1

b
0

c
1

d
1

Execution order: b, d, c

CHAPTER 1. FUNDAMENTALS OF VHDL

Combinational Loop: Incorrect Simulation

1ns

.............................................

-cycle

-cycle

Final value

47

a
1

b
b

Circuit with final values

c
0

d
0

1ns

-cycle

-cycle

Final value

Execution order: b, c, d

a
1

b
b

Circuit with final values

c
1

d
1

Execution order: b, d, c
Interestingly, even without projected assignment, the final values in each simulation are consistent
with the static functionality of the gates in the circuit (e.g., the output of the AND gate d is 1

1.6.5

VHDL Delta-Cycle Simulation

48

exactly when both inputs are 1.) This demonstrates that simply checking the final values of a
simulation is not a sufficient technique to determine if the simulation was done correctly.
This circuit, and other combinational loops, such as set-reset latches constructed from crosscoupled NOR gates, also demonstrate some of the difficulties and counter-intuitive behaviour that
arise in comparing simulation results that use different notions of time. By assigning different
delay values to the gates the final values on the gates might be either the same or different from the
zero-delay simulation.

1.6.5

VHDL Delta-Cycle Simulation

We have already covered the two most important concepts in the delta-cycle simulation algorithm:
delta cycles as an infinitessimally small amount of time and projected assignments where the effect
of an assigment to a signal becomes visible at the beginning of the next delta cycle. This section
flushes out these concepts by connecting them to the syntax of VHDL and filling in some details.
The algorithm presented here is a simplification of the actual algorithm in Section 12.6 of the
VHDL Standard. The two most significant simplifications are that this algorithm does not support
delayed assignments or resolution.
To support delayed assignments, each signals projected value is generalized to a projected waveform (more precisely, VHDLs term is projected output waveform), which is a list containing the
values and times for multiple projected assignments in the future.
Resolution allows multiple processes to write to the same signal. This is usally a mistake, but
it is useful for tri-state busses, where all but one of the processes write a value of Z, and the
one process that has been granted permission to write to bus writes a 1 or a 0. To support
resolution, each signals projected value would become a set of values, where each value represents
the value written by one process. At the end of the simulation cycle, the set of values are resolved
into a single final value. The values of 1 and Z resolve to 1. Similarly, 0 and Z
resolve to 0. However, 1 and 0 resolve to X, indicating that the processes are driving
conflicting values.
In our presentation, we begin with an informal description of the delta-cycle simulation algorithm
and illustrate the algorithm with the back-to-back buffer example. We then give the definitions and
a somewhat more formal presentation of the delta-cycle simulation algorithm and do the back-toback flip flops and combinational loop examples.

1.6.5.1

Informal Description of Algorithm

Processes have three modes:


Resumed : The process has work to do and is waiting its turn to execute.
Executing : The process is running.
Suspended : The process is idle and has no work to do.

49

CHAPTER 1. FUNDAMENTALS OF VHDL

A simulation run is initialization followed by a sequence of simulation rounds


Initialization:
Each process starts off resumed. This gets the simulation started by giving each process a
chance to execute.
Each signal starts off with its default value. The default value of a std logic signal is U,
unless the signal is given an initial value in its declaration (e.g., signal s : std logic := 0;).
In each simulation round:
Increment time at the beginning of the simulation round. Time then remains constant until the
next round.
Resume all processes that are waiting for the current time
A simulation round is a sequence of simulation cycles.
In each simulation cycle:
Copy projected value of signals to current value.
Resume processes based on sensitivity lists and wait conditions.
Execute each resumed process.
If no projected assignment changed the value of a signal, then increment time and start next
simulation round.
At the beginning of the simulation of a normal circuit, processes that drive the clock and other
external inputs will assign values to their signals, then the changes to these external inputs will
propagate through the circuit

1.6.5

1.6.5.2

VHDL Delta-Cycle Simulation

50

Example: VHDL Simulation of Back-to-Back Buffers

We repeat the back-to-back buffer example from section 1.6.4.4, but now add the time scales
(simulation rounds, simulation cycles, and simulation steps) and process modes, to make it a fully
detailed delta-cycle simulation.
proc_a : process begin
a <= 0;
wait for 1 ns;
a <= 1;
wait;
end process;

a
values
old new

proc_b : process (a)


begin
b <= a;
end process;
proc_c : process (b)
begin
c <= b;
end process;

graphical
symbol

text

U 0

U U

0 1

Time measured in nanoseconds, simulation rounds, and simulation cycles


Process modes: R=resumed, E=executing, S=suspended
Initial values: processes=R, signals=U
Each column is a simulation step: process mode change or signal change.
First simulation cycle in each simulation round is not a delta cycle
Simulation cycle ends when
all processes are suspended.
Time
Sim rounds
Sim cycles
proc_a
proc_b
proc_c

Simulation round ends (and time increments)


when simulation cycle has no assignments
to projected values.

0ns

1ns

R
R
R

R E
E

b
U

U
U

R E

2ns

R E
R E

S
S
R E

51

CHAPTER 1. FUNDAMENTALS OF VHDL

1.6.5.3

Definitions and Algorithm

Notes on Simulation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


At a wait statement, the process will suspend even if the condition is true in the current simulation cycle. The process will resume the next time that a signal in the condition changes and the
condition is true.
If we execute multiple assignments to the same signal in the same process in the same simulation
cycle, only the last assignment actually takes effect all but the last assignment are ignored.
In a simulation round, the first simulation cycle is not a delta cycle.
The mode of a process is determined implicitly by keeping track of the set of processes that
are resumed (the resume set) and the process(es) that is(are) executing. All other processes are
suspended.

VHDL Simulation Definitions

..........................................................

Definition simulation step: Executing one sequential assignment or process mode


change.

Definition simulation cycle: The operations that occur in one iteration of the simulation
algorithm.

Definition delta cycle: A simulation cycle that does not advance simulation time.
Equivalently: A simulation cycle with zero-delay assignments where the assignment
causes a process to resume.

Definition simulation round: A sequence of simulation cycles that all have the same
simulation time.
Note: Official and unofficial terminology
Simulation cycle and delta cycle
are official definitions in the VHDL Standard. Simulation step and simulation
round are not standard definitions. We use them because we need words to
associate with the concepts that they describe.

1.6.5

VHDL Delta-Cycle Simulation

More Formal Description of Algorithm

52

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

( initialization )
set all signals to default value;
add to resume set all processes;
set time to 0 ns;
( simulation loop )
while time < {
( begin simulation round )
add to resume set all processes that are waiting for current time;
while time does not change {
( begin simulation cycle )
copy projected values of signals to current values;
add to resume set any process that:
is sensitive to a signal that changed value
or whose wait-condition became true;
execute all processes in resume set;
( assign to projected values of signals )
( execute until suspend on a wait statement or sensitivity list )
clear resume set; ( resume set =
6 )
if none of the executing processes performed a signal assignment then {
increment time to the minimum of the wait times for processes;
}
}
}

1.6.5.4

Example: Delta-Cycle Simulation of Back-to-Back Flip-Flops

We now do a full delta-cycle simulation of back-to-back flip flops. This expands on the simpler
simulation we did in section 1.6.4.5, where we did delta-cycles and projected assignments informally, but did not include the time scales or process modes.

53

CHAPTER 1. FUNDAMENTALS OF VHDL

proc_a : process
begin
a <= 0;
wait for 9 ns;
a <= 1;
wait;
end process;

proc_clk : process
begin
clk <= 0;
wait for 10 ns;
clk <= 1;
wait for 10 ns;
end process;

b
a

clk

proc_flops : process
begin
wait until rising_edge(clk);
b <= a;
c <= b;
end process;

re=rising edge
Time
Sim rounds
Sim cycles
proc_a
proc_clk
proc_flops

0ns

9ns

10ns

R
R
R

R E

20ns

30ns

S
R E

E S

R E
R E

U
a
U

U
clk

U
b
U

c
U

Back-to-Back Flip-Flops with If-Rising Edge

...........................................

We now repeat the back-to-back flip flops example, but compare if-rising-edge based code to

1.6.5

VHDL Delta-Cycle Simulation

54

wait until rising-edge based code. The values on the flip-flops is the same. But, because the
if-rising-edge process has the clock in its sensitivity list, this process executes whenever there
is a change on the clock signal. In comparison, the wait until rising-edge process executes only
when there is a rising edge on the clock. When the if-rising-edge process executes after a falling
edge of the clock, the process suspends without executing any signal assignments, because the
assignments are within the then clause of the if-rising-edge.
proc_flops1 : process
begin
wait until rising_edge(clk);
b1 <= a;
c1 <= b1;
end process;

Time
Sim rounds
Sim cycles
proc_a
proc_clk
proc_flops1
proc_flops2

proc_flops2 : process (clk)


begin
if rising_edge(clk) then
b2 <= a;
c2 <= b2;
end if;
end process;

10ns

20ns

R E

S
E

clk

U
b1
U
U

c1
U
U
b2
U
U
c2
U

R E

R E
R

30ns

R E S

55

1.6.5.5

CHAPTER 1. FUNDAMENTALS OF VHDL

Example: VHDL Simulation of Combinational Loop

We now do a full VHDL delta-cycle simulation of the combinational loop example that we did
informally in section 1.6.4.6. Notice that the last simulation cycles at 0 ns and 1 ns do not contain
any signal assignments or process mode changes. These simulation cycles are needed, because the
VHDL simulation semantics require a simulation cycle to follow any simulation cycle in which a
projected assignment occurs, even if the projected assignment did not change the projected value
of the signal.
proc_a : process begin
a <= 0;
wait for 1 ns;
a <= 1;
wait;
end process;

a
b

proc_b : process (a)


begin
b <= not( a );
end process;
proc_c : process (a,b,d)
begin
c <= not( a ) or b or d;
end process;
proc_d : process (a,c)
begin
d <= a and c;
end process;

1.6.5

Time
Sim rounds
Sim cycles
proc_a
proc_b
proc_c
proc_d

VHDL Delta-Cycle Simulation

56

0ns

1ns

R
R
R
R

S
E

S
E

S
E

R E
R
R

S
E

S
E

R E
R

S
E

U
a
U

b
U

c
U

d
U
These assignments cause this simulation cycle.

1.6.5.6

Rules and Observations for Drawing Delta-Cycle Simulations

The VHDL Language Reference Manual gives only a textual description of the VHDL semantics.
The conventions for drawing the waveforms are just our own.
Each column is a simulation step.
In a simulation step, either exactly one process changes mode or exactly one signal changes
value, except in the first two simulation steps of each simulation cycle, when multiple current
values may be updated and multiple processes may resume.
If a projected assignment assigns the same value as the signals current projected value, the
assignment must still be shown, because this assignment will force another simulation cycle in
the current simulation round.
If a signals current value is updated with the same value as it currently has, this assignment is
not shown, because it will not trigger any sensitivity lists.
Assignments to signals may be denoted by either the number/letter of the new value or one of
the edge symbols:

old value

new value
U 0 1
U
0
1

57

CHAPTER 1. FUNDAMENTALS OF VHDL

Some observations about delta-cycle simulation waveforms that can be helpful in checking that a
simulation is correct:
In the first simulation step of the first simulation cycle of a simulation round (i.e., the first
simulation step of a simulation round), at least one process will resume. This is contrast to the
first simulation step of all other simulation cycle, where current values of signals are updated
with projected values.
At the end of a simulation cycle all processes are suspended.
In the last simulation cycle of a simulation round either no signals change value, or any signal
that changes value is not in the sensitivity list of any process.

1.6.6

External Inputs and Flip-Flops

In our work so far with delta-cycle simulation, we have worked through the mechanics of simulation. This example applies knowledge of delta-cycle simulation at a conceptual level. We could
answer the question by thinking about the semantics of delta-cycle simulation or by mechanicaly
doing the simulation.

Question:

Do the signals b1 and b2 have the same behaviour from 1020 ns?

architecture mathilde of sauv


e is
signal clk, a, b : std_logic;
begin
process begin
clk <= 0;
wait for 10 ns;
clk <= 1;
wait for 10 ns;
end process;
process begin
wait for 10 ns;
a1 <= 1;
end process;
process begin
wait until rising_edge(clk);
a2 <= 1;
end process;
process begin
wait until rising_edge( clk );
b1 <= a1;
b2 <= a2;
end process;
end architecture;

1.6.6

External Inputs and Flip-Flops

Answer:
The signals b1 and b2 will have the same behaviour if a1 and a2 have the
same behaviour. The difference in the code between a1 and a2 is that a1 is
waiting for 10ns and a2 is waiting until a rising edge of the clock. There is a
rising edge of the clock at 10ns, so we might be tempted to conclude
(incorrectly) that both a1 and a2 transition from U to 1 at exactly 10ns and
therefore have exactly the same behaviour.
The difference between the behaviour of a1 and a2 is that in the first
simulation cycle for 10ns, the process for a1 resumes, while the process for
a2 resumes only after the rising edge of clock.
The signal a1 is waiting for 10ns, so in the first simulation cycle for 10ns, the
process for a1 resumes and executes. Also in the first simulation cycle for
10ns, the clock toggles from 0 to 1. This rising edge causes the processes for
a2, b1, and b2 to resume and execute in the second simulation cycle.
In the second simulation cycle for 10ns:
a2 changes from U to 1.
b1 sees the value of 1 for a1, because a1 became 1 in the first simulation
cycle.
b2 sees the old value of U for a2, because the process for a2 did not
execute in the first simulation cycle.

58

59

CHAPTER 1. FUNDAMENTALS OF VHDL

Time
Sim rounds
Sim cycles
proc_clk
proc_a1
proc_a2
proc_b

10ns

20ns

R E
R

S
E

S
R E
R

S
E

clk

U
a1
U
U
a2
U
U
b1
U
U
b2
U

1.7. REGISTER-TRANSFER-LEVEL SIMULATION

1.7

60

Register-Transfer-Level Simulation

Delta-cycle simulation is very tedious for both humans and computers. For many circuits, the
complexity of delta-cycle simulation is not needed and register-transfer-level simulation, which is
much simpler, can be used instead.
The major complexities of delta-cycle simulation come from running a process multiple times
within a single simulation round and keeping track of the modes of the proceses. Register-transferlevel simulation avoids both of these complexities. By evaluating each signal only once per simulation round, an entire simulation round can be reduced to a single column in a timing diagram.
The disadvantage of register-transfer-level simulation is that it does not work for all VHDL programs in particular, it does not support combinational loops.

1.7.1

Overview

In delta-cycle simulations, we often simulated the same process multiple times within the same
simulation round. In looking at the circuit though, we mentally can calculate the output value
by evaluating each gate only once per simulation round. For both humans and computers (or the
humans waiting for results from computers), it is desirable to avoid the wasted work of simulating
a gate when the output will remain at U or will change again later in the same simulation round.
In register-transfer-level simulation, we evaluate each gate only once per simulation round. Registertransfer-level simulation is simpler and faster than delta-cycle simuation, because it avoids delta
cycles and provisional assignments.
In delta-cycle simulation, we evaluate a gate multiple times in a single simulation round if the
process that drives the gate is active in multiple simulation cycles, which happens when the process
is triggered in multiple simulation cycles. To avoid this, we must evaluate a signal only after all of
the signals that it depends on have stable values, that is, the signals will not change value later in
the simulation round.
A combinational loop is a circuit that contains a cyclic path through the circuit that includes only
combinational gates. Combinational loops can cause signals to oscillate, which in delta-cycle
simulation with zero-delay assignments, corresponds to an infinite sequence of delta cycles. We
immediately see that when doing zero-delay simulation of a combinational loop such as
a <= not(a);, the change on a will trigger the process to re-run and re-evaluate a an infinite
number of times. Hence, register-transfer-level simulation does not support combinational loops.
To make register-transfer simulation work, we preprocess the VHDL program and transform it so
that each process is dependent upon only those processes that appear before it. This dependency
ordering is called topological ordering. If a circuit has combinational loops, we cannot sort the
processes into a topological order.
The register-transfer level is a coarser level of temporal abstraction than the delta-cycle level.
In delta-cycle simulation, many delta-cycles can elapse without an increment in real time (e.g.

61

CHAPTER 1. FUNDAMENTALS OF VHDL

nanoseconds). In register-transfer-level simulation, all of the events that take place in the same
moment of real time take place at same moment in the simulation. In other words, all of the events
that take place at the same time are drawn in the same column of the waveform diagram.
Register-transfer-level simulation can be done for legal VHDL code, either synthesizable or unsynthesizable, so long as the code does not contain combinational loops. For any piece of VHDL code
without combinational loops, the register-transfer-level simulation and the delta-cycle simulation
will have same value for each signal at the end of each simulation round.
By sorting the processes in topological order, when we execute a process, all of the signals that the
process depends on will have already been evaluated, and so we know that we are reading the final,
stable values that each signal will have for that moment in time. This is good, because for most
processes, we want to read the most recent values of signals. The exceptions are timed processes
that are dependent upon other timed processes running at the same moment in time and clocked
processes that are dependent upon other clocked processes.
process begin
a <= 0;
wait for 10 ns;
a <= 1;
...
end process;

Question: In this code, what value


should b have at 10 ns does it
read the new value of a or the old
value?

process begin
b <= 0;
wait for 10 ns;
b <= a;
...
end process;

Answer:
Both processes will execute in
the same simulation cycle at 10
ns. The statement b <= a will
see the value of a from the
previous simulation cycle, which
is before a <= 1; is
evaluated. The signal b will be
0 at 10 ns.

As the above example illustrates, if a clocked process reads the values of signals from processes
that resume at the same time, it must read the previous value of those signals. Similarly, if a
clocked process reads the values of signals from processes that are sensitive to the same clock,
those processes will all resume in the same simulation cycle the cycle immediately after the
rising-edge of the clock (assuming that the processes use if rising edge or wait until
rising edge statements). Because the processes run in the same simulation cycle, they all read
the previous values of the signals that they depend on. If this were not the case, then the VHDL
code for pair of back-to-back flip flops would not operate correctly, because the output of the first
flip-flop would appear immediately at the output of the second flip-flop.
Simulation rounds begin with incrementing time, which triggers timed processes. Therefore, the
first processes in the topological order are the timed processes. Timed processes may be run in any
order, and they read the previous values of signals that they depend on. This gives the same effect

1.7.2

Technique for Register-Transfer Level Simulation

62

as in delta-cycle simulation, where the timed processes would run in the same simulation cycle and
read the values that signals had before the simulation cycle began.
We then sort the clocked and combinational processes based on their dependencies, so that each
process appears (is run) after all of the processes on which it depends.
Although a clocked process may read many signals, we say that a clocked process is dependent
upon only its clock signal. It is the change in the clock signal that causes the process to resume.
So, as long as the process is run after the clock signal is stable, we can be sure that it will not need
to be run again at this time step. Clocked processes may be run in any order. They read the current
value of their clock signal and the previous value of the other signals that they depend on. As
with timed processes, this gives the same effect as in delta-cycle simulation, where the clock edge
would trigger the clocked processes to run in the same simulation cycle and the processes would
read the values that signals had before the simulation cycle began.

1.7.2

Technique for Register-Transfer Level Simulation

1. Pre-processing
(a) Separate processes into timed, clocked, and combinational
(b) Decompose each combinational process into separate processes with one target signal
per process
(c) Sort combinational processes into topological order based on dependencies
2. For each moment of real time:
(a) Run timed processes in any order, reading old values of signals.
(b) Run clocked processes in any order, reading new values of timed signals and old values
of registered signals.
(c) Run combinational processes in topological order, reading new values of signals.

63

CHAPTER 1. FUNDAMENTALS OF VHDL

Combinational Process Decomposition

................................................ .

proc(a,b,c)
if a = 1 then
d <= b;
e <= c;
else
d <= not b;
e <= b and c;
end if;
end process;

Original code

1.7.3

Examples of RTL Simulation

1.7.3.1

RTL Simulation Example 1

proc(a,b,c)
if a = 1 then
d <= b;
else
d <= not b;
end if;
end process;
proc(a,b,c)
if a = 1 then
e <= c;
else
e <= b and c;
end if;
end process;

After decomposition

We revisit an earlier example from delta-cycle simulation, but change the code slightly and do
register-transfer-level simulation.
1. Original code:
proc1: process (a, b, c) begin
d <= NOT c;
c <= a AND b;
end process;
proc2: process (b, d) begin
e <= b AND d;
end process;

proc3: process begin


a <= 1;
b <= 0;
wait for 3 ns;
b <= 1;
wait for 99 ns;
end process;

2. Decompose combinational processes into single-target processes:

1.7.3

Examples of RTL Simulation

64

proc1d: process (c) begin


d <= NOT c;
end process;

proc1c: process (a, b) begin


c <= a AND b;
end process;

proc1c: process (a, b) begin


c <= a AND b;
end process;

proc1d: process (c) begin


d <= NOT c;
end process;

proc2: process (b, d) begin


e <= b AND d;
end process;

proc2: process (b, d) begin


e <= b AND d;
end process;

Decomposed

Sorted

3. To sort combinational processes into topological order, move proc1d after proc1c, because d depends on c.
4. Run timed process (proc3) until suspend at wait for 3 ns;.
The signal a gets 1 from 0 to 3 ns.
The signal b gets 0 from 0 to 3 ns.
5. Run proc1c
The signal c gets a AND b (1 AND 0 = 0) from 0 to 3 ns.
6. Run proc1d
The signal d gets NOT c (NOT 0 = 1) from 0 to 3 ns.
7. Run proc2
The signal e gets b AND d (0 AND 1 = 0) from 0 to 3 ns.
8. Run the timed process until suspend at wait for 99 ns;, which takes us from 3ns to
102ns.
9. Run combinational processes in topological order to calculate values on c, d, e from 3ns to
102ns.

Waveforms

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
0ns

1ns

2ns

3ns

U 1

U 0

U 0

U 1

U 0

102ns

65

CHAPTER 1. FUNDAMENTALS OF VHDL

Example: Procs with Multiple Waits

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

huey: process
begin
clk <= 0;
wait for 10 ns;
10 30 50 70 90 110
clk <= 1;
wait for 10 ns;
20 40 60 80 100
end process;

10

30 50 90

70 110

dewey: process
begin
a <= to_unsigned(0,4);
wait until re(clk);
110
while (a < 4) loop
a <= a + 1;
wait until re(clk);
end loop;

10

30 50 70 90

louie: process
begin
d <= 1;
wait until re(clk);
if (a >= 2) then
d <= 0;
wait until re(clk);
end if;
end process;

end process;

I 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

110

clk

U 0

U 0

U 1

120

1.8. SIMPLE RTL SIMULATION IN SOFTWARE

A Related Simulation

66

..................................................................

Small changes to the code can cause significant changes to the behaviour.
riri: process
begin
clk <= 1;
wait for 10 ns;
clk <= 0;
wait for 10 ns;
end process;

loulou: process
begin
wait until re(clk);
d <= 1;
if (a < 2) then
d <= 0;
wait until re(clk);
end if;

fifi: process
begin
a <= to_unsigned(0,4);
wait until re(clk);
while (a < 4) loop
a <= a + 1;
wait until re(clk);
end loop;

end process;

end process;
I 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

110

120

clk
a
d

1.8

Simple RTL Simulation in Software

This section describes how we can use a software programming language to write cycle-accurate
models of hardware systems. We use Python as our example language, but any imperative language
may be used.

1.8.1

Introductory Examples

67

CHAPTER 1. FUNDAMENTALS OF VHDL

Two Registers, Two Adders

Cyclic Dependencies

.......................................................... .
#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
#----------------print( "%s" % (t a, b) )
#----------------a = a + b
b = b + 1
#-----------------------

...................................................................

We add a cyclic dependency between the registers a and b. Remember that registers must execute in parallel and that VHDL achieves the illusion of parallel execution through projected assignments. We mimic projected assignments in software by introducing a next copy of each
variable.
#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
#----------------print( "%s" % (t, a, b) )
#----------------# projected assignments to next
a_next = a + b
b_next = a - b
#----------------# update current from next
a = a_next
b = b_next
#-----------------

1.8.2

Regs and Comb

We explore several alternative coding styles to support circuits with both combinational and registered hardware with cyclic depencencies among the registers.

1.8.2

Regs and Comb (use next)

Regs and Comb

68

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

In our first approach, we follow the projected assignment coding style.


The disadvantage of this approach is that it requires us to initialize combinational variables.

a
d
1
b

c
2

Regs and Comb (best style)

#----------------------a = 1
b = 2
c = 2 * b
d = c + 1
#----------------------for t in range( 20 ) :
#----------------# execute registers
a_next = a + d
b_next = a - c
#----------------# drive registers
a = a_next
b = b_next
#----------------# execute comb
c = 2 * b
d = c + 1

........................................................... .

We improve upon the previous version of the code by eliminating the need to initialize the combinational variables. We recognize that in the steady-state of the simulation run, the instructions in
the simulation loop just execute one after the other, and it does not matter which instruction is at
the top of the loop. We eliminate the combinational initialization code by rotating the instructions
in the loop such that the combinational datapath instructions are at the top of the loop. We can then
delete the combinational initialization without affecting the behaviour of the system.

69

CHAPTER 1. FUNDAMENTALS OF VHDL

a
d
1
b

c
2

Regs and Comb (RTL sim style)

#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
#----------------# execute comb
c = 2 * b
d = c + 1
#----------------# execute registers
a_next = a + d
b_next = a - c
#----------------# drive registers
a = a_next
b = b_next

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

As an alternative to what we have seen so far, here we follow the style of our RTL simulation
algorithm, where registered variables read the old values of other registered variables. We add
variables that keep track of the old, or previous, values of registered variables. When we read a
variable, we read its old value and then update the current values with the old values at the end
of the clock cycle. This approach has some disadvantages. We show it to illustrate the duality
between next and prev values of variables: two mechanisms to achieve the same behaviour.
This approach has two disadvantages:
We must initialize combinational variables.
When reading variables, we must distinguish between registered and combinational variables.
It is generally preferable to distinguish between registered and combinational variables when writing to a variable rather than when reading it. Two main reasons are:
For realistic-size circuits, we generally read a variable more times than we assign to it, so it
requires less typing and is less susceptible to mistakes.
In hardware, the distinction between combinational and registered signals is made in the circuitry
that drives the signal, not the other gates that read the signal.

1.8.3

a
d
1
b

c
2

1.8.3

Inputs

70

#----------------------# initialize registers


a = 1
b = 2
#----------------------# initialize comb
c = 2 * b
d = c + 1
#----------------------for t in range( 20 ) :
#----------------# hold old values
a_prev = a
b_prev = b
#----------------# execute registers
a = a_prev + d
b = a_prev - c
#----------------# execute comb
c = 2 * b
d = c + 1

Inputs

We model inputs by loading values into an array, then reading the values one-by-one from the
array.

i_c
b

#----------------------f = open( "input.txt", r )


i = 0
for v in f.readlines() :
i_c[i] = v.strip()
i
= i + 1
#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
c = i_c[t]
#----------------print( "%s" % (t, a, b, c) )
#----------------a = a + b
b = b + c

71

1.8.4

CHAPTER 1. FUNDAMENTALS OF VHDL

Pipeline

Pipelines are a special style of system that allows us to dispense with the next version of variables. By assigning values to registers in reverse topological order (from the end of the pipeline
back to the front), we can assign values directly to the current-value variables.
We first show a model for the pipeline written using next-variables. The assignments are done in
reverse order, from back to front. This is possible because the use of the next-variables ensures
that there are not any dependencies between these assignments, and so all orders of execution will
produce the same results.
With this order of assignments to the next-variables, there is no dependency between the assignent
to a next-variable in execution and the driving of the current variable. For example, we read from
c in the line before we write to c next, and c is not read until we drive it from c next. Thus,
we can remove c next and the executing assignment can drive c directly.
We can see that this technique is correct also by remembering rules for register-transfer-level simulation: registered signals read the old values of registered signals and combinational signals read
the new values of all signals. By doing the registered assignments in reverse order, each variable
sees the old values of the other registers.
a

G
b

H
c

I
d

1.8.4

#----------------------# initialization
b = 0
c = 0
d = 0
#----------------------for t in range( 20 ) :
#----------------print( "%s" % ... )
#----------------# execute regs
d_next = H( c )
c_next = G( b )
b_next = F( a[t] )
#----------------# drive regs
b = b_next
c = c_next
d = d_next
#----------------# execute comb
e = I( d )

With next variables

Pipeline with Comb

Pipeline

72

#----------------------# initialization
b = 0
c = 0
d = 0
#----------------------for t in range( 20 ) :
#----------------print( "%s" % ... )
#----------------# execute and drive
b = F( a[t] )
c = G( b )
d = H( c )
e = I( d )

Without next variables

.................................................................. .

When we introduce combinational variables (e.g., c below) into the design, the order of assignments becomes a bit more complicated, but is still systematic. We do the registers in reverse-order,
then within each stage (between adjacent registers), we do the combinational signals in topological
order.
We need to initialize the registers, but do not need to initialize the combinational variables, because
the combinational variables are executed before they are read.

73

CHAPTER 1. FUNDAMENTALS OF VHDL

I
e

Pipeline with Feedback

#----------------------# initialize registers


b = 0
d = 0
e = 0
#----------------------for t in range( 20 ) :
f = J( d, e )
c = G( b )
d = H( c )
e = I( c )
b = F( a )

................................................................

When we introduce feedback into a pipeline (e.g., from c to a below), we no longer can rely solely
on performing the registered assignments in reverse topological order.
The variable that is fed back to an earlier stage must use a projected assignment to break the
feedback loop. All other variables are unaffected.

G
a

1.9

H
b

I
c

#----------------------for t in range( 20 ) :
#----------------print( "%s" % (t, ...) )
#----------------d
= I( c )
c_next = H( b )
b
= G( a )
a
= F( c )
#----------------c = c_next

Variables in VHDL
This is an advanced section.
It is not covered in the course
and will not be tested.

Variables in VHDL have the same semantics as variables in a software language. Variables may
be declared inside processes, functions, and procedures. Variables should not be declared inside
architectures. For a variable to be declared in an architecture, it must be a shared variable, and
shared variables are not synthesizable.

1.9.1

1.9.1

Semantics

74

Semantics

Variables are updated immediately. More precisely, in contrast to signals, variables have only a
current value, not a separate projected value and current value. The value of a variable is visible
(driven) in the same simulation cycle, immediately after the variable assignment statement is
executed. This behaviour causes variables to act like combinational hardware.
Variables hold their value until they are assigned a new value. In this respect, variables act like
registers or latches.

1.9.2

Usage of Variables

The inconsistent behaviour in variables acting like both combinational hardware and registers/latches
makes variables potentially risky to use in code that is intended to be synthesized.
Difficult to predict what hardware will be synthesized.
May get quite different hardware from different tools.
Easy to write code that is synthesizable by some tools and not by others.
Any behaviour and circuit that can be modeled using variables can be modeled using only signals.
Variables are never necessary; they are only a convenience to be exploited when using signals
would be cumbersome.
Recommendation: use variables only when you need combinational hardware inside a clocked
process.
The example below illustrates the acceptable use of a variable, and an equivalent circuit using only
signals.
process
variable v : std_logic;
begin
wait until rising_edge(clk);
r1 <= a;
r2 <= b;
v := r1 xor r2;
r3 <= not v;
end process;

Intermediate variable

v <= r1 xor r2;


process
begin
wait until rising_edge(clk);
r1 <= a;
r2 <= b;
r3 <= not s;
end process;

Intermediate signal

The dual combinational/registered nature of variables can be seen in the program below, where
the variable v is synthesized into two separate pieces of hardware, one combinational and one
registered.

75

CHAPTER 1. FUNDAMENTALS OF VHDL

process
variable v : std_logic;
begin
wait until rising_edge(clk);
if a = 1 then
v := b;
else
v := c;
wait until rising_edge(clk);
end if;
z <= v;
end process;

1.10

Delta-Cycle Simulation with Delays


This is an advanced section.
It is not covered in the course
and will not be tested.

Assignments with delays (e.g., b <= a after 2 ns;) are used to model delays through gates
and wires in circuits. Simulation with delays is often called timing simulation, because the simulation captures both values and the timing of the circuit.

1.10.1

Transport and Inertial Delay

Transport delay models the time it takes for an edge or value to propagate along the gates and
wires between the signals that are read and the target signal.
Inertial delay models the phenomenon that physical devices have inertia and cannot switch instantaneously from one value to another. Glitches or pulses that are shorter in duration than the
inertial delay are deleted.
transport delay
inertial delay
a

assigned value of 1
rejection window
Existing values of a before rejection window
are unaffected by assignment.
Current time when
assignment is executed.

Assign a a value of 1 with transport and inertial delays.

1.10.2

Delayed Assignment Semantics

76

The rejection window is the period of time before the new value arrives where old values will be
deleted if they would result in a glitch or pulse. The difference between the rejection window and
the inertial delay is a delay value that is relative to the transport delay, while the rejection window
is absolute window of time (start time and stop time):
T p = transport delay
T i = inertial delay
T r = rejection window
T r.begin = T p T i
T r.end = T p

A sample assignment with the value 1 showing a transport delay of 10 ns and an inertial delay
of 3 ns:
10ns

b <= reject 3 ns inertial 1 after 10 ns;


3ns

The default value for inertial delay is 0 ns. The two statements below are equivalent:
b <= 1 after 10 ns;
b <= reject 0 ns inertial 1 after 10 ns;

10ns

The keyword reject may be ommitted if the inertial delay is equal to the transport delay. The
two statements below are equivalent:
b <= inertial 1 after 10 ns;
b <= reject 10 ns inertial 1 after 10 ns;

1.10.2

10ns

Delayed Assignment Semantics

The use of delayed assignments requires us to extend the notion of projected values (section 1.6) to
projected waveforms. The VHDL Language Reference Manual uses the phrase projected output
waveform, but for simplicity we use just projected waveform. Each signal has a projected
waveform, which describes the delayed assignments that are projected to happen in the future.
More precisely, a waveform is a sequence of transactions, and a transaction is a (value, time) pair.
When a signal assignment is executed, some of the targets existing transactions may be deleted,
according to the rules below:

77

CHAPTER 1. FUNDAMENTALS OF VHDL

Existing projected transactions


*

Time relative to rejection window


Before
During
After
Preserved
*
Deleted

Existing transactions that occur during the rejection window are preserved if they:
have the same value as the first new transaction
and are not followed by existing transactions with different values.

Simple Examples

......................................................................

The figure below shows some simple examples of how the target signals projected waveform (the
lhs) is updated by the expression on the right hand side of the assignment.

Existing projected waveform of target signal (lhs)

Right-hand side of assignment (rhs)

lhs
rhs

lhs
rhs

res

res

lhs
rhs

lhs
rhs

res

res

lhs
rhs

lhs
rhs

res

res

lhs
rhs

lhs
rhs

res

res

lhs
rhs

lhs
rhs

res

res

lhs
rhs

lhs
rhs

res

res

1.10.2

Complex Examples

Delayed Assignment Semantics

78

....................................................................

Most of the complexities in understanding the rules for projected waveforms stem from issues of
which transactions are deleted from the targets existing projected waveform.
lhs

lhs

rhs

rhs

res

res

lhs

lhs

rhs

rhs

res

res

lhs

lhs

rhs

rhs

res

res

lhs

lhs

rhs

rhs

res

res

lhs

lhs

rhs

rhs

res

res

Sampling Signals in Expressions

...................................................... .

Signals in the right-hand-side expression are evaluated at the time that the assignment is executed.

79

CHAPTER 1. FUNDAMENTALS OF VHDL

c <= a after 2 ns, b after 4 ns;


t+2ns
t+4ns
t
a

Current value

Current value

Projected waveform
Current time when assignment is executed.

1.10.3

Simulation Examples

In delta-cycle simulation, when we increment time, we need to check both for processes that need
to resume (as before), and signals that need to update their current value.
As we execute delayed assignments, the projected waveform of the target signals evolve with the
addition and deletion of transactions.

Transport Delay 1

.....................................................................

This example illustrates simulation with a transport delay. The signal a and b do not have delayed
assignments, so we simulate these signals exactly as we have done before. For the projected value
of c, we need to keep track of a sequence of projected values that will occur in the future, and so
we need to keep track of multipe (value,time) pairs.
process begin
a <= 0;
b <= 0;
wait for 10 ns;
a <= 1;
wait for 4 ns;
b <= 1;
wait;
end process;

process (a,b) begin


c <= a xor b after 5 ns;
end process;

1.10.3

Time
Sim rounds
Sim cycles
proc_1
proc_2

Simulation Examples

0ns

5 10ns

14ns

R E
R

R E
S

R E

15ns

S
E

80

R E
R E

19ns

S
R E

U
a
U
U
b
U
U

(U,5)

(U,5)(0,5)

(1,15)

(1,15)(0,19)

(0,19)

c
U

0 ns+1 This is the second assignment of a value to c at 5 ns, and so this assignent overwrites
the previous one.
5 ns The projected value of 0 for c is copied to the visible value. This is an unusual simulation
round, because no processes executed.
14 ns+1 We have two transactions in our projected waveform for c: a value of 1 at 15 ns and
a value of 0 at 19 ns.
15 ns We update the current value of c from its projected waveform and so delete the transaction
that was scheduled for 15 ns.
19 ns Similar to at 15 ns, we update the current value of c and delete the corresponding transaction.

Transport Delay 2
process begin
a <= 0;
b <= 1;
c <= 1;
wait for 5 ns;
c <= 0;
wait;
end process;

.....................................................................
process (a,b,c) begin
if c = 1 then
d <= a after 9 ns;
else
d <= b after 3 ns;
end if;
end process;

81

CHAPTER 1. FUNDAMENTALS OF VHDL

Time
Sim rounds
Sim cycles
proc_1
proc_2

0ns

3ns

5ns

8ns

R E
R

R E
E

R E

S
R E

U
a
U
U
b
U
U
c
U
U

(U,3)

(U,3)(0,9) (0,9)

(1,8)(0,9)

d
U

5 ns+1 We execute d <= b after 3 ns. The existing transaction for d is (0,9 ns).
The new transaction is (1, 8 ns). Because the existing transaction is projected to
occur after the new transaction, the existing transaction is deleted.

Transport and Inertial Delay 1

.........................................................

Including an inertial delay does not affect the behaviour, so long as there are no pulses whose width
is less than the inertial delay. In this example, c has 1 pulse that is 4 ns long and an inertial
delay of 3 ns.
process begin
a <= 0;
b <= 0;
wait for 10 ns;
a <= 1;
wait for 4 ns;
b <= 1;
wait;
end process;

process (a,b) begin


c <= reject 3 ns inertial a xor b after 5 ns;
end process;

1.10.3

Time
Sim rounds
Sim cycles
proc_1
proc_2

Simulation Examples

0ns

5 10ns

14ns

R E
R

R E
S

R E

15ns

S
E

82

R E
R E

19ns

R E

U
a
U
U
b
U
U

(U,5)

(U,5)(0,5)

(1,15)

(1,15)(0,19)

(0,19)

c
U

Transport and Inertial Delay 2

.........................................................

In this second example of transport and inertial delay, there is only 1 ns between a change on a at
10 ns and a change on b at 11 ns, which would result in a 1 ns pulse on a. But, the inertial delay of
3 ns cancels out this pulse at 11 ns+1 when the assignment to b becomes visible.
process (a,b) begin
c <= reject 3 ns inertial a xor b after 5 ns;
end process;

process begin
a <= 0;
b <= 0;
wait for 10 ns;
a <= 1;
wait for 1 ns;
b <= 1;
wait;
end process;
Time
Sim rounds
Sim cycles
proc_1
proc_2

0ns

R E
R

5 10ns

R E
E

R E

11ns

R E
R E

16ns

S
R E

U
a
U
U
b
U
U
c
U

(U,5)

(U,5)(0,5)

(1,15)

(1,15)(0,16)

83

CHAPTER 1. FUNDAMENTALS OF VHDL

1.10.4

Waveform Expressions

The right-hand-side expressions may contain multiple transactions:


a <= 0 after 2 ns, 1 after 5 ns;
lhs

Old projected waveform of target

rhs

Waveform expression

res

New projected waveform of target

There is only one inertial delay for the waveform expression:


a <= reject 1 ns inertial 0 after 2 ns, 1 after 5 ns;
t+2ns

t+5ns

lhs

Old projected waveform of target

rhs

Waveform expression

res

New projected waveform of target

The delays between transactions in the waveform expression do not need to be consistent with the
inertial delay, in that the delays between transactions may be less than the inertial delay:
a <= reject 3 ns inertial 0 after 5 ns, 1 after 7 ns;
t+2ns

t+5ns t+7ns

lhs

Old projected waveform of target

rhs

Waveform expression

res

New projected waveform of target

1.11. VHDL AND HARDWARE BUILDING BLOCKS

1.11

84

VHDL and Hardware Building Blocks

This section outlines the building blocks for register transfer level design and how to write VHDL
code for the building blocks.

1.11.1

Basic Building Blocks

(also: n-to-1 muxes)


2:1 mux

CE

WE
A

DI

WE
DO

A0

DO0

DI0
A1

DO1

Hardware
VHDL
AND, OR, NAND, NOR, XOR, and, or, nand, nor, xor, xnor
XNOR
multiplexer
if-then-else, case statement,
selected assignment, conditional assignment
adder, subtracter, negater
+, -, shifter, rotater
sll, srl, sla, sra, rol, ror
flip-flop
wait until,
if-then-else,
rising edge
memory array, register file, queue
2-d array or library component
Figure 1.15: RTL Building Blocks

85

1.11.2

CHAPTER 1. FUNDAMENTALS OF VHDL

Deprecated Building Blocks for RTL

Some of the common gates you have encountered in previous courses should be avoided when
synthesizing register-transfer-level hardware, particularly if FPGAs are the implementation technology.

1.11.2.1

An Aside on Flip-Flops and Latches

flip-flop Edge sensitive: output only changes on rising (or falling) edge of clock
latch Level sensitive: output changes whenever clock is high (or low)
A common implementation of a flip-flop is a pair of latches (Master/Slave flop).
Latches are sometimes called transparent latches, because they are transparent (input directly
connected to output) when the clock is high.
The clock to a latch is sometimes called the enable line.
There is more information in the course notes on timing analysis for storage devices (section 8.3).

1.11.2.2

Deprecated Hardware

Latches
Use flops, not latches
Latch-based designs are susceptible to timing problems
The transparent phase of a latch can let a signal leak through a latch causing the
signal to affect the output one clock cycle too early
Its possible for a latch-based circuit to simulate correctly, but not work in real hardware,
because the timing delays on the real hardware dont match those predicted in synthesis
T, JK, SR, etc flip-flops
Limit yourself to D-type flip-flops
Some FPGA and ASIC cell libraries include only D-type flip flops. Others, such as Alteras APEX FPGAs, can be configured as D, T, JK, or SR flip-flops.
Tri-State Buffers
Use multiplexers, not tri-state buffers
Tri-state designs are susceptible to stability and signal integrity problems
Getting tri-state designs to simulate correctly is difficult, some library components dont
support tri-state signals
Tri-state designs rely on the code never letting two signals drive the bus at the same time
It can be difficult to check that bus arbitration will always work correctly
Manufacturing and environmental variablity can make real hardware not work correctly
even if it simulates correctly

1.11.3

Hardware and Code for Flops

86

Typical industrial practice is to avoid use of tri-state signals on a chip, but allow tri-state
signals at the board level
Note:
Unfortunately and surprisingly, PalmChip has been awarded a
US patent for using uni-directional busses (i.e. multiplexers) for systemon-chip designs. The patent was filed in 2000, so all fourth-year design
projects since 2000 that use muxes on FPGAs will need to pay royalties to
PalmChip

1.11.3

Hardware and Code for Flops

1.11.3.1

Flops with Waits and Ifs

The two code fragments below synthesize to identical hardware (flops).


If
Wait
process (clk)
begin
if rising_edge(clk) then
q <= d;
end if;
end process;

1.11.3.2

process
begin
wait until rising_edge(clk);
q <= d;
end process;

Flops with Synchronous Reset

The two code fragments below synthesize to identical hardware (flops with synchronous reset).
Notice that the synchronous reset is really nothing more than an AND gate on the input.
If
process (clk)
begin
if rising_edge(clk) then
if (reset = 1) then
q <= 0;
else
q <= d;
end if;
end if;
end process;

Wait
process
begin
wait until rising_edge(clk);
if reset = 1 then
q <= 0;
else
q <= d;
end if;
end process;

87

CHAPTER 1. FUNDAMENTALS OF VHDL

1.11.3.3

Flops with Chip-Enable

The two code fragments below synthesize to identical hardware (flops with chip-enable lines).
If
process (clk)
begin
if rising_edge(clk) then
if ce = 1 then
q <= d;
end if;
end if;
end process;

1.11.3.4

Wait
process
begin
wait until rising_edge(clk);
if ce = 1 then
q <= d;
end if;
end process;

Flop with Chip-Enable and Mux on Input

The two code fragments below synthesize to identical hardware (flops with chip-enable lines and
muxes on inputs).
If
process (clk)
begin
if rising_edge(clk) then
if ce = 1 then
if sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end if;
end process;

Wait
process
begin
wait until rising_edge(clk);
if ce = 1 then
if sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end process;

1.11.4

1.11.3.5

An Example Sequential Circuit

88

Flops with Chip-Enable, Muxes, and Reset

The two code fragments below synthesize to identical hardware (flops with chip-enable lines,
muxes on inputs, and synchronous reset). Notice that the synchronous reset is really nothing
more than a mux, or an AND gate on the input.
Note:
The specific combination and order of tests is important to guarantee
that the circuit synthesizes to a flop with a chip enable, as opposed to a levelsensitive latch testing the chip enable and/or reset followed by a flop.
Note:
The chip-enable pin on the flop is connected to both ce and reset.
If the chip-enable pin was not connected to reset, then the flop would ignore
reset unless chip-enable was asserted.
If
process (clk)
begin
if rising_edge(clk) then
if ce = 1 or reset =1 then
if reset = 1 then
q <= 0;
elsif sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end if;
end process;

1.11.4

Wait
process
begin
wait until rising_edge(clk);
if ce = 1 or reset = 1 then
if reset = 1 then
q <= 0;
elsif sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end process;

An Example Sequential Circuit

There are many ways to write VHDL code that synthesizes to the schematic in figure ??. The
major choices are:
1. Categories of signals
(a) All signals are outputs of flip-flops or inputs (no combinational signals)
(b) Signals include both flopped and combinational
2. Number of flopped signals per process
(a) All flopped signals in a single process
(b) Some processes with multiple flopped signals
(c) Each flopped signal in its own process
3. Style of flop code
(a) Flops use if statements

89

CHAPTER 1. FUNDAMENTALS OF VHDL

(b) Flops use wait statements


Some examples of these different options are shown in figures ?? ??.

sel reset

clk

entity and_not_reg is
port (
reset,
clk,
sel
: in std_logic;
c
: out std_logic
c
);
end;

Schematic and entity for examples of different code organizations in Figures ?? ??


Figure 1.16: Schematic and entity for and not reg

One Process, Flops, Wait

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

architecture one_proc of and_not_reg is


signal a : std_logic;
begin
process begin
wait until rising_edge(clk);
if reset = 1 then
a <= 0;
elsif sel = 1 then
a <= not a;
else
a <= a;
end if;
c <= not a;
end process;
end one_proc;

Figure 1.17: Implementation of Figure ??: all signals are flops, all flops in one process, flops use waits

1.11.4

Two Processes, Flops, Wait

An Example Sequential Circuit

90

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

architecture two_proc_wait of and_not_reg is


signal a : std_logic;
begin
process begin
wait until rising_edge(clk);
if reset = 1 then
a <= 0;
elsif sel = 1 then
a <= not a;
else
a <= a;
end if;
end process;
process begin
wait until rising_edge(clk);
c <= not a;
end process;
end two_proc_wait;

Figure 1.18: Implementation of Figure ??: all signals are flops, one flop per process, flops use waits

91

CHAPTER 1. FUNDAMENTALS OF VHDL

Two Processes with If-Then-Else

.......................................................

architecture two_proc_if of and_not_reg is


signal a : std_logic;
begin
process (clk)
begin
if rising_edge(clk) then
if reset = 1 then
a <= 0;
elsif sel = 1 then
a <= not a;
else
a <= a;
end if;
end if;
end process;
process (clk)
begin
if rising_edge(clk) then
c <= not a;
end if;
end process;
end two_proc_if;

Figure 1.19: Implementation of Figure ??: all signals are flops, one flop per process, flops use if-then-else

1.12. SYNTHESIZABLE VS NON-SYNTHESIZABLE CODE

Concurrent Statements

92

................................................................

architecture comb of and_not_reg is


signal a, b, d : std_logic;
begin
process (clk) begin
if rising_edge(clk) then
if reset = 1 then
a <= 0;
else
a <= d;
end if;
end if;
end process;
process (clk) begin
if rising_edge(clk) then
c <= not a;
end if;
end process;
d <=
b when sel = 1 else a;
b <= not a;
end comb;

Figure 1.20: Implementation of Figure ??: flopped and combinational signals, one flop per process, flops use if-then-else

1.12

Synthesizable vs Non-Synthesizable Code

For us to consider a VHDL progam synthesizable, all of the conditions below must be satisfied:
the program must be theoretically implementable in hardware
the hardware that is produced must be consistent with the structure of the source code
the source code must be portable across a wide range of synthesis tools, in that the synthesis
tools all produce correct hardware
Synthesis is done by matching VHDL code against templates or patterns. Its important to use
idioms that your synthesis tools recognizes. If you arent careful, you could write code that has
the same behaviour as one of the idioms, but which results in inefficient or incorrect hardware.
section 1.11 described common idioms and the resulting hardware.
Most synthesis tools agree on a large set of idioms, and will reliably generate hardware for these
idioms. This section is based on the idioms that Synopsys, Xilinx, Altera, and Mentor Graphics
are able to synthesize.

93

1.12.1

CHAPTER 1. FUNDAMENTALS OF VHDL

Initial Values

Initial values on signals (UNSYNTHESIZABLE)


signal bad_signal : std_logic := 0;

Reason: In most implementation technologies, when a circuit powers up, the values on signals
are completely random. Some FPGAs are an exception to this. For some FPGAs, when a chip is
powered up, all flip flops will be 0. For other FPGAs, the initial values can be programmed.

1.12.2

Wait For

Wait for length of time (UNSYNTHESIZABLE)


wait for 10 ns;

Reason: Delays through circuits are dependent upon both the circuit and its operating environment,
particularly supply voltage and temperature.

1.12.3

Variables

process
variable bad : std_logic;
begin
wait until rising_edge(clk);
bad := not a;
d <= bad and b;
e <= bad or c;
end process;

Use signals, do not use variables


reason The intention of the creators of VHDL was for signals to be wires and variables to be
just for simulation. Some synthesis tools allow some uses of variables, but when using
variables, it is easy to create a design that works in simulation but not in real hardware.

1.12.4

Bits and Booleans

signal bad1 : bit;


signal bad2 : boolean;

Use std_logic signals, do not use bit or Boolean signals.


reason std_logic is the most commonly used signal type across synthesis tools and simulation tools.

1.12.5

1.12.5

Assignments before Wait Statement

94

Assignments before Wait Statement

If a synthesizable clocked process has a wait statement, then the process must begin with a wait
statement.
process
c <= a;
wait until rising edge(clk);
d <= b;
wait until rising edge(clk);
end process;

process
wait until rising edge(clk);
d <= b;
wait until rising edge(clk);
c <= a;
end process;

Unsynthesizable

Synthesizable

Reason: In simulation, any assignments before the first wait statement will be executed in the
first delta-cycle. In the synthesized circuit, the signals will be outputs of flip-flops and will first be
assigned values after the first rising-edge. To maintain equivalent behaviour between simulation
and synthesis, most synthesis tools require that no assigments appear before the first wait statement
in a process.

1.12.6

Different Wait Conditions

wait statements with different conditions in a process (UNSYNTHESIZABLE)


-- different clock signals
process
begin
wait until rising_edge(clk1);
x <= a;
wait until rising_edge(clk2);
x <= a;
end process;

-- different clock edges


process
begin
wait until rising_edge(clk);
x <= a;
wait until falling_edge(clk);
x <= a;
end process;

Detailed reason: processes with multiple wait statements are turned into finite state machines. The
wait statements denote transitions between states. The target signals in the process are outputs of
flip flops. Using different wait conditions would require the flip flops to use different clock signals
at different times. Multiple clock signals for a single flip flop would be difficult to synthesize,
inefficient to build, and fragile to operate.

1.12.7

Multiple if rising edge in Process

Multiple if rising edge statements in a process (UNSYNTHESIZABLE)

95

CHAPTER 1. FUNDAMENTALS OF VHDL

process (clk)
begin
if rising_edge(clk) then
q0 <= d0;
end if;
if rising_edge(clk) then
q1 <= d1;
end if;
end process;

Reason: The idioms for synthesis tools generally expect just a single if rising edge statement in each process. The simpler the VHDL code is, the easier it is to synthesize hardware.
Programmers of synthesis tools make idiomatic restrictions to make their jobs simpler.

1.12.8

1.12.8

if rising edge and wait in Same Process

96

if rising edge and wait in Same Process

An if rising edge statement and a wait statement in the same process (UNSYNTHESIZABLE)
process
begin
if rising_edge(clk) then
q0 <= d0;
end if;
wait until rising_edge(clk);
q0 <= d1;
end process;

Reason: The idioms for synthesis tools generally expect just a single type of flop-generating statement in each process.

1.12.9

if rising edge with else Clause

The if statement has a rising edge condition and an else clause (UNSYNTHESIZABLE).
process (clk)
begin
if rising_edge(clk) then
q0 <= d0;
else
q0 <= d1;
end if;
end process;

Reason: The idioms for the synthesis tools expect a signal to be either registered or combinational,
not both.

1.12.10

Loop with Both Comb and Clocked Paths

loops where some paths are clocked and some are not (UNSYNTHESIZABLE)

97

CHAPTER 1. FUNDAMENTALS OF VHDL

process begin
while c /= 1 loop
if b = 1 then
wait until rising_edge(clk);
e <= d;
else
e <= not d;
end if;
end loop;
e <= b;
end process;

Reason: if the loop condition is true and the if-then-else condition is false, then the combinational
path is taken and the process will get stuck in an infinite loop going through the combinational
path.

1.12.11

1.12.11

wait Inside of a for loop

98

wait Inside of a for loop

wait statements in a for loop (UNSYNTHESIZABLE)


process
begin
for i in 0 to 7 loop
wait until rising_edge(clk);
x <= to_unsigned(i,4);
end loop;
end process;

Reason: Idiom of synthesis tools; while-loops with the same behaviour are synthesizable.

Synthesizable Alternative to Wait-Inside-For

.......................................... .

while loop (synthesizable)


This is the synthesizable alternative to the the wait statement in a for loop above.
process
begin
-- output values from 0 to 4 on i
-- sending one value out each clock cycle
i <= to_unsigned(0,4);
wait until rising_edge(clk);
while (4 > i) loop
i <= i + 1;
wait until rising_edge(clk);
end loop;
end process;

1.13

Guidelines for Desirable Hardware

It is possible to write code that is synthesizable, but undesireble. This sections describes our
guidelines for writing synthesizable code that will result in desireable hardware. Our coding
guidelines are designed for creating circuits will work well for a wide range of implementation
technologies from low-end FPGAs to high-speed ASICs.
Remember, there is a world of difference between getting a design to work in simulation and
getting it to work on a real FPGA. And there is also a huge difference between getting a design
to work in an FPGA for a few minutes of testing and getting thousands of products to work for
months at a time in thousands of different environments around the world.

99

CHAPTER 1. FUNDAMENTALS OF VHDL

Finally, note that there are exceptions to every rule. You might find yourself in a circumstance
where your particular situation (e.g. choice of tool, target technology, etc) would benefit from
bending or breaking a guideline here. Within E&CE 327, of course, there wont be any such
circumstances.
Our list of undesirable hardware features is:
latches
asynchronous resets
combinational loops
using a data signal as a clock
using a clock signal as data
tri-state buffers and signals
multiple drivers for a signal
We limit our definition of bad practice to code that produces undesirable hardware. The guidelines
do not address coding styles that lead to inefficient hardware. Inefficient or unoptimized hardware
might be useful in the early stages of the design process, when the focus is on functionality and not
optimality. As such, inefficient code is not considered bad practice. Poor coding styles that do not
affect the hardware, for example, including extraneous signals in a sensitivity list, should certainly
be avoided, but fall into the general realm of programming guidelines and will not be discussed.

1.13.1

Know Your Hardware

The most important guideline is: know what you want the synthesis tool to build for you.
For every signal in your design, know whether it should be a flip-flop or combinational. Check
the output of the synthesis tool see if the flip flops in your circuit match your expectations, and
to check that you do not have any latches in your design.
If you cannot predict what hardware the synthesis tool will generate, then you probably will be
unhappy with the result of synthesis.

1.13.2

1.13.2

Latches

100

Latches

Combinational if-then without else

................................................

process (a, b)
begin
if (a = 1) then
c <= b;
end if;
end process;

For a combinational process, every signal that is assigned to, must be assigned to in every branch
of if-then and case statements.
reason If a signal is not assigned a value in a path through a combinational process, then that
signal will be a latch.
note For a clocked process, if a signal is not assigned a value in a clock cycle, then the flip-flop
for that signal will have a chip-enable pin. Chip-enable pins are fine; they are available on
flip-flops in essentially every cell library.

Signals Missing from Sensitivity List

.................................................. .

process (a)
begin
c <= a and b;
end process;

For a combinational process, the sensitivity list should contain all of the signals that are read in
the process.
reason Gives consistent results across different tools. Many synthesis tools will implicitly
include all signals that a process reads in its sensitivity list. This differs from the VHDL
Standard. A synthesis tool that adheres to the standard will either generate an error or will
create hardware with latches or flops clocked by data sigansl if not all signals that are read
from are included in the sensitivity list.
exception In a clocked process using an if rising edge, it is acceptable to have only the
clock in the sensitivity list

101

1.13.3

CHAPTER 1. FUNDAMENTALS OF VHDL

Asynchronous Reset

In an asynchronous reset, the test for reset occurs outside of the test for the clock edge.
process (reset, clk)
begin
if (reset = 1) then
q <= 0;
elsif rising_edge(clk) then
q <= d;
end if;
end process;

All reset signals should be synchronous.


reason If a reset occurs very close to a clock edge, some parts of the circuit might be reset in
one clock cycle and some in the subsequent clock cycle. This can lead the circuit to be out
of sync as it goes through the reset sequence, potentially causing erroneous internal state
and output values.

1.13.4

Combinational Loops

A combinational loop is a cyclic path of dependencies through one or more combinational processes.
process (a, b, c) begin
if a = 0 then
d <= b;
else
d <= c;
end if;
end process;
process (d, e) begin
b <= d and e;
end process;

If you need a signal to be dependent on itself, you must include a register somewhere in the
cyclic path.
reason Combinational loops are almost always unstable, in that the value on a signal in the
loop is unpredictable and can change over time, even if the none of the inputs change.
note Registered loops are fine.
note Internally, the implementations of flip-flops and other storage devices use combinational
loops, but these loops are built and analyzed at the analog level to ensure that they are
stable.

1.13.5

1.13.5

Using a Data Signal as a Clock

102

Using a Data Signal as a Clock

process begin
wait until rising_edge(clk);
count <= count + 1;
end process;
process begin
waiting until rising_edge( count(5) );
b <= a;
end process;

Data signals should be used only as data.


reason All data assignments should be synchronized to a clock. This ensures that the timing
analysis tool can determine the maximum clock speed accurately. Using a data signal as a
clock clock signals can lead to unpredictable delays between different assignments, which
makes it infeasible to do an accurate timing analysis.

1.13.6

Using a Clock Signal as Data

process begin
wait until rising_edge(clk);
count <= count + 1;
end process;
b <= a and clk;

Clock signals should be used only as clocks.


reason Clock signals have two defined values in a clock cycle and transition in the middle of
the clock cycle. At the register-transfer level, each signal has exactly one value in a clock
cycle and signals transition between values only at the boundary between clock cycles.

103

1.13.7

CHAPTER 1. FUNDAMENTALS OF VHDL

Tri-State Buffers and Signals

Z as a Signal Value

..................................................................

process (sel, a0)


b <= a0 when sel = 0
else Z;
end process;
process (sel, a1)
b <= a1 when sel = 1
else Z;
end process;

Use multiplexers, not tri-state buffers.


reason Multiplexers are more robust than tri-state buffers, because tri-state buffers rely on analog effects such as drive-strength and voltages that are between 0 and 1. Multiplexers
require more area than tri-state buffers, but for the size of most busses, the advantage in a
more robust design is worth the cost in extra area.

Inout and Buffer Port Modes

......................................................... .

entity bad is
port (
io_bad : inout std_logic;
buf_bad : buffer std_logic
);
end entity;

Use in or out, do not use inout or buffer


reason inout and buffer signals are tri-state.
note If you have an output signal that you also want to read from, you might be tempted to
declare the mode of the signal to be inout. A better solution is to create a new, internal,
signal that you both read from and write to. Then, your output signal can just read from
the internal signal.

1.13.8

1.13.8

Multiple Drivers

104

Multiple Drivers

process begin
wait until rising edge(clk);
if reset = 1 then
y <= 0;
z <= 0;
end if;
end process;
process begin
wait until rising edge(clk);
if reset = 0 then
if a = 1 then
z <= b and c;
else
z <= d;
end if;
end if;
end process;
process begin
wait until rising edge(clk);
if reset = 0 then
if b = 1 then
y <= c;
end if;
end if;
end process;

Each signal should be assigned to in only one process. This is often called the single assignment
rule.
reason Multiple processes driving the same signal is the same as having multiple gates driving
the same wire. This can cause contention, tri-state values, and other bad things.
exception Multiple drivers are acceptable for tri-state busses or if your implementation technology has wired-ANDs or wired-ORs. FPGAs do not have wired-ANDs or wired-ORs,
and many ASIC designers consider them to be risky and bad practice.

You might also like