Professional Documents
Culture Documents
Course Notes
(with Solutions)
2015t1 (Winter)
Instructor: Rodolfo Pellizzoni
Notes by:
Mark Aagaard
University of Waterloo
Dept of Electrical and Computer Engineering
ii
Contents
1
Fundamentals of VHDL
1.1 Introduction to VHDL . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Levels of Abstraction . . . . . . . . . . . . . . . . . . . .
1.1.2 VHDL Origins and History . . . . . . . . . . . . . . . . .
1.1.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4 Synthesis of a Simulation-Based Language . . . . . . . .
1.1.5 Solution to Synthesis Sanity . . . . . . . . . . . . . . . .
1.1.6 Standard Logic 1164 . . . . . . . . . . . . . . . . . . . .
1.2 Comparison of VHDL to Other Hardware Description Languages
1.2.1 VHDL Disadvantages . . . . . . . . . . . . . . . . . . .
1.2.2 VHDL Advantages . . . . . . . . . . . . . . . . . . . . .
1.2.3 VHDL and Other Languages . . . . . . . . . . . . . . . .
1.2.3.1 VHDL vs Verilog . . . . . . . . . . . . . . . .
1.2.3.2 VHDL vs System Verilog . . . . . . . . . . . .
1.2.3.3 VHDL vs SystemC . . . . . . . . . . . . . . .
1.2.3.4 Summary of VHDL Evaluation . . . . . . . . .
1.3 Overview of Syntax . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Syntactic Categories . . . . . . . . . . . . . . . . . . . .
1.3.2 Library Units . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Entities and Architecture . . . . . . . . . . . . . . . . . .
1.3.4 Concurrent Statements . . . . . . . . . . . . . . . . . . .
1.3.5 Component Declaration and Instantiations . . . . . . . . .
1.3.6 Processes . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.7 Sequential Statements . . . . . . . . . . . . . . . . . . .
1.3.8 A Few More Miscellaneous VHDL Features . . . . . . .
1.4 Concurrent vs Sequential Statements . . . . . . . . . . . . . . . .
1.4.1 Concurrent Assignment vs Process . . . . . . . . . . . . .
1.4.2 Conditional Assignment vs If Statements . . . . . . . . .
1.4.3 Selected Assignment vs Case Statement . . . . . . . . . .
1.4.4 Coding Style . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Overview of Processes . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Combinational Process vs Clocked Process . . . . . . . .
1.5.2 Latch Inference . . . . . . . . . . . . . . . . . . . . . . .
1.5.3 Combinational vs Flopped Signals . . . . . . . . . . . . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
14
16
17
18
18
19
19
19
20
20
20
20
21
21
21
22
22
25
27
27
29
30
30
30
30
31
31
32
34
35
37
CONTENTS
1.6
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
38
39
40
40
41
42
43
43
45
48
48
50
51
52
55
56
57
60
60
62
63
63
66
66
67
70
71
73
74
74
75
75
76
79
83
84
84
85
85
85
86
86
86
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
87
88
88
92
93
93
93
93
94
94
94
96
96
96
98
98
99
100
101
101
102
102
103
104
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
144
144
145
146
146
148
151
151
152
153
154
156
157
158
162
163
163
164
CONTENTS
Overview of FPGAs
3.1 Generic FPGA Hardware . . . . . . . . . .
3.1.1 Generic FPGA Cell . . . . . . . . .
3.1.2 Lookup Table . . . . . . . . . . . .
3.1.3 Interconnect for Generic FPGA . . .
3.1.4 Blocks of Cells for Generic FPGA .
3.1.5 Special Circuitry in FPGAs . . . . .
3.2 Area Estimation for FPGAs . . . . . . . . .
3.2.1 Area for Circuit with one Target . .
3.2.2 Algorithm to Allocate Gates to Cells
3.2.3 Area for Arithmetic Circuits . . . .
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
165
165
166
170
171
174
176
177
178
181
186
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
191
192
192
193
194
195
196
197
202
202
203
204
205
206
211
215
216
223
226
230
231
233
233
236
240
244
245
247
251
252
259
259
263
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
CONTENTS
4.4.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.4 Data-Dependency Graph . . . . . . . . . . . . . . . . . . .
4.4.5 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . .
4.4.6 Area Optimization . . . . . . . . . . . . . . . . . . . . . .
4.4.7 Assign Names to Registered Signals . . . . . . . . . . . . .
4.4.8 VHDL #1: Big and Obviously Correct . . . . . . . . . . . .
4.4.9 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.10 VHDL #2: Post-Allocation . . . . . . . . . . . . . . . . .
4.4.11 Explicit State Machine . . . . . . . . . . . . . . . . . . . .
4.4.12 VHDL Implementation #3 . . . . . . . . . . . . . . . . . .
4.5 Design Example: Hnatyshyn with Combinational Inputs and Outputs
4.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . . .
4.5.3 Maximum Throughput Design . . . . . . . . . . . . . . . .
4.5.4 Minimum Area Design . . . . . . . . . . . . . . . . . . . .
4.5.5 Minimum Area Design with ASAP Parcels . . . . . . . . .
4.5.6 Minimum Area Design with Unpredictable Bubbles . . . . .
4.6 Hnatyshyn with Registered Inputs and Combinational Output . . . .
4.6.1 Dataflow Diagram and Behaviour . . . . . . . . . . . . . .
4.7 Hnatyshyn with Registered Inputs and Outputs . . . . . . . . . . . .
4.7.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Data-Dependency Graph . . . . . . . . . . . . . . . . . . .
4.7.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . .
4.7.4 Area Optimization . . . . . . . . . . . . . . . . . . . . . .
4.7.5 Assign Names to Registered Signals . . . . . . . . . . . . .
4.7.6 VHDL #1: Big and Obviously Correct . . . . . . . . . . . .
4.7.7 Tangent: Combinational Outputs . . . . . . . . . . . . . . .
4.7.8 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.9 VHDL #2: Post-Allocation . . . . . . . . . . . . . . . . . .
4.7.10 Separate Datapath and Control . . . . . . . . . . . . . . .
4.8 Design Example: Hnatyshyn with Bubbles . . . . . . . . . . . . . .
4.8.1 Control Table Standard Method . . . . . . . . . . . . . .
4.8.2 Control Table Valid Bit Shortcut . . . . . . . . . . . . .
4.9 Example: LeBlanc . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9.1 System Description . . . . . . . . . . . . . . . . . . . . . .
4.9.2 Design for ASAP Parcels . . . . . . . . . . . . . . . . . . .
4.9.2.1 Implicit State Machine . . . . . . . . . . . . . . .
4.9.2.2 Explicit State Machine . . . . . . . . . . . . . . .
4.9.2.3 Datapath Control . . . . . . . . . . . . . . . . . .
4.9.2.4 Final Implementation . . . . . . . . . . . . . . . .
4.9.2.5 Buggy Implementation . . . . . . . . . . . . . . .
4.9.3 Design for Unpredictable Bubbles . . . . . . . . . . . . . .
4.9.3.1 Implicit State Machine . . . . . . . . . . . . . . .
4.9.3.2 Explicit State Machine . . . . . . . . . . . . . . .
4.9.3.3 Datapath Control . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
265
266
267
268
269
271
273
279
281
289
293
295
296
301
306
308
311
316
316
325
326
327
328
329
330
332
334
335
341
343
354
360
365
371
372
378
382
385
387
388
389
391
393
396
401
CONTENTS
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
413
414
415
416
417
418
419
423
428
431
432
432
434
435
437
440
443
446
453
453
457
458
463
463
464
465
466
467
469
471
472
473
475
478
479
481
487
489
491
ix
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
493
494
495
500
502
503
507
515
516
521
521
521
522
523
523
524
526
527
528
Performance Analysis
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
7.2 Defining Performance . . . . . . . . . . . . . . . . .
7.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . .
7.4 Comparing Performance . . . . . . . . . . . . . . .
7.4.1 General Equations . . . . . . . . . . . . . . .
7.4.2 Example: Performance of Printers . . . . . .
7.5 Clock Speed, CPI, Program Length, and Performance
7.5.1 Mathematics . . . . . . . . . . . . . . . . . .
7.5.2 Example: CISC vs RISC and CPI . . . . . .
7.5.3 Effect of Instruction Set on Performance . . .
7.6 Effect of Time to Market on Relative Performance . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
529
530
531
534
537
537
545
546
546
547
551
556
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
Timing Analysis
8.1 Delays and Definitions . . . . . . . . . . . . . . . . . .
8.1.1 Background Definitions . . . . . . . . . . . . . .
8.1.2 Clock-Related Timing Definitions . . . . . . . .
8.1.2.1 Clock Latency . . . . . . . . . . . . .
8.1.2.2 Clock Skew . . . . . . . . . . . . . . .
8.1.2.3 Clock Jitter . . . . . . . . . . . . . . .
8.1.3 Storage-Related Timing Definitions . . . . . . .
8.1.3.1 Flops and Latches . . . . . . . . . . .
8.1.4 Propagation Delays . . . . . . . . . . . . . . . .
8.1.5 Timing Constraints . . . . . . . . . . . . . . . .
8.1.6 Review: Timing Parameters . . . . . . . . . . . .
8.2 Timing Analysis of Simple Latches . . . . . . . . . . . .
8.2.1 Structure and Behaviour of Multiplexer Latch . .
8.2.2 Strategy for Timing Analysis of Storage Devices
8.2.3 Clock-to-Q Time of a Latch . . . . . . . . . . . .
8.2.4 From Load Mode to Store Mode . . . . . . . . .
8.2.5 Setup Time Analysis . . . . . . . . . . . . . . .
8.2.6 Hold Time of a Multiplexer Latch . . . . . . . .
8.2.7 Example of a Bad Latch . . . . . . . . . . . . .
8.2.8 Summary . . . . . . . . . . . . . . . . . . . . .
8.3 Advanced Timing Analysis of Storage Elements . . . . .
8.4 Critical Path . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 Introduction to Critical and False Paths . . . . . .
8.4.1.1 Example of Critical Path in Full Adder
8.4.1.2 Longest Path and Critical Path . . . . .
8.4.1.3 Criteria for Critical Path Algorithms . .
8.4.2 Longest Path . . . . . . . . . . . . . . . . . . .
8.4.2.1 Algorithm to Find Longest Path . . . .
8.4.2.2 Longest Path Example . . . . . . . . .
8.4.3 Monotone Speedup . . . . . . . . . . . . . . . .
8.5 False Paths . . . . . . . . . . . . . . . . . . . . . . . . .
8.6 Analog Timing Model . . . . . . . . . . . . . . . . . . .
8.6.1 Defining Delay . . . . . . . . . . . . . . . . . .
8.6.2 Modeling Circuits for Timing . . . . . . . . . . .
8.6.3 Example: Two Buffers . . . . . . . . . . . . . .
8.6.4 Ex: Two Bufs with Both Caps . . . . . . . . . .
8.7 Elmore Delay Model . . . . . . . . . . . . . . . . . . .
8.7.1 Elmore Delay as an Approximation . . . . . . .
8.7.2 A More Complicated Example . . . . . . . . . .
8.8 Practical Usage of Timing Analysis . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
563
564
564
565
565
567
569
571
571
574
575
577
578
578
581
582
583
584
590
593
596
597
598
600
601
603
606
607
607
608
609
613
614
615
619
623
628
632
632
635
639
xi
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
641
642
642
643
644
644
645
646
649
651
653
654
655
655
655
660
666
666
670
670
671
673
680
681
682
683
684
688
690
690
692
694
696
702
703
703
704
CONTENTS
10 Review
10.1 Overview of the Term . . . . . . . . . .
10.2 VHDL . . . . . . . . . . . . . . . . . .
10.2.1 VHDL Topics . . . . . . . . . .
10.2.2 VHDL Example Problems . . . .
10.3 RTL Design Techniques . . . . . . . . .
10.3.1 Design Topics . . . . . . . . . .
10.3.2 Design Example Problems . . . .
10.4 Performance Analysis and Optimization
10.4.1 Performance Topics . . . . . . .
10.4.2 Performance Example Problems .
10.5 Timing Analysis . . . . . . . . . . . . .
10.5.1 Timing Topics . . . . . . . . . .
10.5.2 Timing Example Problems . . .
10.6 Power . . . . . . . . . . . . . . . . . .
10.6.1 Power Topics . . . . . . . . . . .
10.6.2 Power Example Problems . . . .
10.7 Formulas to be Given on Final Exam . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
705
706
707
707
708
709
709
710
711
711
712
713
713
714
715
715
716
717
Chapter 1
Fundamentals of VHDL
1.1
1.1.1
Introduction to VHDL
Levels of Abstraction
There are many different levels of abstraction for working with hardware:
Quantum: Schrodingers equations describe movement of electrons and holes through material.
Energy band: 2-dimensional diagrams that capture essential features of Schrodingers equations. Energy-band diagrams are commonly used in nano-scale engineering.
Transistor: Signal values and time are continous (analog). Each transistor is modeled by a
resistor-capacitor network. Overall behaviour is defined by differential equations in terms of
the resistors and capacitors. Spice is a typical simulation tool.
Switch: Time is continuous, but voltage may be either continuous or discrete. Linear equations are used, rather than differential equations. A rising edge may be modeled as a linear
rise over some range of time, or the time between a definite low value and a definite high
value may be modeled as having an undefined or rising value.
Gate: Transistors are grouped together into gates (e.g. AND, OR, NOT). Voltages are discrete
values such as pure Boolean (0 or 1) or IEEE Standard Logic 1164, which has representations
for different types of unknown or undefined values. Time may be continuous or may be
discrete. If discrete, a common unit is the delay through a single inverter (e.g. a NOT gate
has a delay of 1 and AND gate has a delay of 2).
13
1.1.2
14
Register transfer level: The essential characteristic of the register transfer level is that the
behaviour of hardware is modeled as assignments to registers and combinational signals.
Equations are written where a register signal is a function of other signals (e.g. c = a
and b;). The assignments may be either combinational or registered. Combinational assignments happen instanteously and registered assignments take exactly one clock cycle.
There are variations on the pure register-transfer level. For example, time may be measured
in clock phases rather than clock cycles, so as to allow assignments on either the rising or
falling edge of a clock. Another variation is to have multiple clocks that run at different
speeds a clock on a bus might run at half the speed of the primary clock for the chip.
Transaction level: The basic unit of computation is a transaction, such as executing an instruction on a microprocessor, transfering data across a bus, or accessing memory. Time
is usually measured as an estimate (e.g. a memory write requires 15 clock cycles, or a
bus transfer requires 250 ns.). The building blocks of the transaction level are processors,
controllers, memory arrays, busses, intellectual property (IP) blocks (e.g. UARTs). The
behaviour of the building blocks are described with software-like models, often written in
behavioural VHDL, SystemC, or SystemVerilog. The transaction level has many similarities
to a software model of a distributed system.
Electronic-system level: Looks at an entire electronic system, with both hardware and software.
In this course, we will focus on the register-transfer level. In the second half of the course, we will
look at how analog phenomenon, such as timing and power, affect the register-transfer level. In
these chapters we will occasionally dip down into the transistor, switch, and gate levels.
1.1.2
15
development
verification
synthesis
testing
hardware designs
communication
maintenance
modification
procurement
........................................................................
Developed by the United States Department of Defense as part of the very high speed integrated
circuit (VHSIC) program in the early 1980s.
The Department of Defense intended VHDL to be used for the documentation, simulation and
verification of electronic systems.
Goals:
improve design process over schematic entry
standardize design descriptions amongst multiple vendors
portable and extensible
Inspired by the ADA programming language
large: 97 keywords, 94 syntactic rules
verbose (designed by committee)
static type checking, overloading
complicated syntax: parentheses are used for both expression grouping and array indexing
Example:
a <= b * (3 + c); -- integer
a <= (3 + c);
-- 1-element array of integers
Standardized by IEEE in 1987 (IEEE 1076-1987), revised in 1993, 2000.
In 1993 the IEEE standard VHDL package for model interoperability, STD_LOGIC_1164
(IEEE Standard 1164-1993), was developed.
std_logic_1164 defines 9 different values for signals
In 1997 the IEEE standard packages for arithmetic over std logic and bit signals were
defined (IEEE Standard 1076.31997).
numeric_std defines arithmetic over std logic vectors and integers.
Note:
This is the package that you should use for arithmetic. Dont
use std logic arith it has less uniform support for mixed integer/signal arithmetic and has a greater tendency for differences between
tools.
numeric_bit defines arithmetic over bit vectors and integers. We wont use bit
signals in this course, so you dont need to worry about this package.
1.1.3
1.1.3
Semantics
16
Semantics
The original goal of VHDL was to simulate circuits. The semantics of the language define circuit
behaviour.
a
c <= a AND b;
simulation
b
c
But now, VHDL is used in simulation and synthesis. Synthesis is concerned with the structure of
the circuit.
Synthesis: converts one type of description (behavioural) into another, lower level, description
(usually a netlist).
c <= a AND b;
synthesis
a
c
b
Synthesis is a computer-aided design (CAD) technique that transforms a designers concise, highlevel description of a circuit into a structural description of a circuit.
CAD Tools
............................................................................
CAD Tools allow designers to automate lower-level design processes in implementing the desired
functionality of a system.
NOTE: EDA = Electronic Design Automation. In digital hardware design EDA = CAD.
Synthesis vs Simulation
................................................................
For synthesis, we want the code we write to define the structure of the hardware that is generated.
The VHDL semantics define the behaviour of the hardware that is generated, not the structure
of the hardware. The scenario below complies with the semantics of VHDL, because the two
synthesized circuits produce the same behaviour. If the two synthesized circuits had different
behaviour, then the scenario would not comply with the VHDL Standard.
17
a
b
ula
tio
sim
same
behaviour
a
c <= a AND b;
synthesis
c
b
simulation
b
c
the
syn
sis
different
structure
same
behaviour
a
a
c
b
simulation
b
c
1.1.4
1.1.5
1.1.5
18
1.1.6
At the core of VHDL is a package named STANDARD that defines a type named bit with values
of 0 and 1. For simulation, it helpful to have additional values, such as undefined and
high impedance. Many companies created their own (incompatible) definitions of signal types
for simulation. To regain compatibility amongst packages from different companies, the IEEE
defined std logc 1164 to be the standard type for signal values in VHDL simulation.
U
X
0
1
Z
W
L
H
--
uninitialized
strong unknown
strong 0
strong 1
high impedance
weak unknown
weak 0
weak 1
dont care
19
From a VLSI perspective, a weak value will come from a smaller gate. One aspect of VHDL that
we dont touch on in ece327 is resolution, which describes how to determine the value of a signal
if the signal is driven by bmore than one/b process. (In ece327, we restrict ourselves to having
each signal be driven by (be the target of) exactly one process). The std logic 1164 library provides
a resolution function to deal with situation where different processes drive the same signal with
different values. In this situation, a strong value (e.g. 1) will overpower a weak value (e.g. L).
If two processes drive the signal with different strong values (e.g. 1 and 0) the signal resolves
to a strong unknown (X). If a signal is driven with two different weak values (e.g. H and L),
the signal resolves to a weak unknown (W).
1.2
1.2.1
1.2.2
VHDL Advantages
VHDL supports unsynthesizable constructs that are useful in writing high-level models, testbenches and other non-hardware or non-synthesizable artifacts that we need in hardware design.
VHDL can be used throughout a large portion of the design process in different capacities, from
specification to implementation to verification.
VHDL has static typechecking many errors can be caught before synthesis and/or simulation.
(In this respect, it is more similar to Java than to C.)
VHDL has a rich collection of datatypes
VHDL is a full-featured language with a good module system (libraries and packages).
VHDL has a well-defined standard.
1.2.3
1.2.3
1.2.3.1
VHDL vs Verilog
20
Verilog is a simpler language: smaller language, simple circuits are easier to write
VHDL has more features than Verilog
richer set of data types and strong type checking
VHDL offers more flexibility and expressivity for constructing large systems.
The VHDL Standard is more standard than the Verilog Standard
VHDL and Verilog have simulation-based semantics
Simulation vendors generally conform to VHDL standard
Some Verilog constructs give different behaviours in simulation and synthesis
VHDL is used more than Verilog in Europe and Japan
Verilog is used more than VHDL in North America
VHDL is used more in FPGAs than in ASICs
South-East Asia, India, South America: ?????
1.2.3.2
1.2.3.3
VHDL vs SystemC
21
1.2.3.4
1.3
Overview of Syntax
This section is just a brief overview of the syntax of VHDL, focusing on the constructs that are
most commonly used. For more information, read a book on VHDL and use online resources.
(Look for VHDL under the Documentation tab in the E&C 327 web pages.)
1.3.1
Syntactic Categories
1.3.2
1.3.2
Library Units
22
Library Units
Library units are the top-level syntactic constructs in VHDL. They are used to define and include
libraries, declare and implement interfaces, define packages of declarations and otherwise bind
together VHDL code.
Package body
define the contents of a library
Packages
determine which parts of the library are externally visible
Use clause
use a library in an entity/architecture or another package
technically, use clauses are part of entities and packages, but they proceed the entity/package
keyword, so we list them as top-level constructs
Entity (section 1.3.3)
define interface to circuit
Architecture (section 1.3.3)
define internal signals and gates of circuit
1.3.3
entity
architecture
entity
architecture
Architecture: internals
structure and behaviour of module
23
library ieee;
use ieee.std_logic_1164.all;
entity and_or is
port (
a, b, c : in std_logic ;
z
: out std_logic
);
end entity;
1.3.3
[ { use_clause } ]
entity ENTITYID is
[ port (
{ SIGNALID : (in | out) TYPEID [ := expr ] ; }
);
]
[ { declaration } ]
[ begin
{ concurrent_statement } ]
end [ entity ] ENTITYID ;
[ { use_clause } ]
architecture ARCHID of ENTITYID is
[ { declaration } ]
begin
[ { concurrent_statement } ]
end [ architecture ] ARCHID ;
24
25
1.3.4
Concurrent Statements
x1
x2
1.3.4
conditional assignment
Concurrent Statements
c <= a+b when sel=1 else a+c when sel=0 else "0000";
selected assignment
with . . . select
. . . <= . . . when . . . | . . . ,
. . . when . . . | . . . ,
...
. . . when . . . | . . . ;
process . . . begin
...
end process;
26
27
1.3.5
There are two different syntaxes for component declaration and instantiation. The VHDL-93 syntax is much more concise than the VHDL-87 syntax.
Not all tools support the VHDL-93 syntax. For E&CE 327, some of the tools that we use do not
support the VHDL-93 syntax, so we are stuck with the VHDL-87 syntax.
1.3.6
Processes
process (a, b, c)
begin
y <= a AND b;
if (a = 1) then
z1 <= b AND c;
z2 <= NOT c;
else
z1 <= b OR c;
z2 <= c;
end if;
end process;
process
begin
y <= a AND b;
z <= 0;
wait until rising_edge(clk);
if (a = 1) then
z <= 1;
y <= 0;
wait until rising_edge(clk);
else
y <= a OR b;
end if;
end process;
Processes must have either a sensitivity list or at least one wait statement on each execution path
through the process.
Processes cannot have both a sensitivity list and a wait statement.
1.3.6
Sensitivity List
Processes
28
....................................................................... .
The sensitivity list contains the signals that are read in the process.
A process is executed when a signal in its sensitivity list changes value.
An important coding guideline to ensure consistent synthesis and simulation results is to include
all signals that are read in the sensitivity list. If you forget some signals, you will either end up
with unpredictable hardware and simulation results (different results from different programs) or
undesirable hardware (latches where you expected purely combinational hardware). For more on
this topic, see sections 1.5.2 and 1.6.
There is one exception to this rule: for a process that implements a flip-flop with an if rising edge
statement, it is acceptable to include only the clock signal in the sensitivity list other signals
may be included, but are not needed.
29
1.3.7
Sequential Statements
wait
wait until . . . ;
signal assignment . . . <= . . . ;
if-then-else
if . . . then . . . elsif . . . end if;
case
case . . . is
when . . . | . . . => . . . ;
when . . . => . . . ;
end case;
loop
loop . . . end loop;
while loop
while . . . loop . . . end loop;
for loop
for . . . in . . . loop . . . end loop;
next
next . . . ;
Figure 1.10: The most commonly used sequential statements
1.3.8
1.3.8
30
Some constructs that are useful and will be described in later chapters and sections:
report : print a message on stderr while simulating
assert : assertions about behaviour of signals, very useful with report statements.
generics : parameters to an entity that are defined at elaboration time.
attributes : predefined functions for different datatypes. For example: high and low indices of a
vector.
1.4
All concurrent assignments can be translated into sequential statements. But, not all sequential
statements can be translated into concurrent statements.
1.4.1
1.4.2
Sequential Statements
if <cond> then
t <= <val1>;
else
t <= <val2>;
end if
31
1.4.3
Concurrent Statements
case <expr> is
when <choices1> =>
t <= <val1>;
when <choices2> =>
t <= <val2>;
when <choices3> =>
t <= <val3>;
end case;
1.4.4
Coding Style
Code thats easy to write with sequential statements, but difficult with concurrent:
Concurrent Statements
Sequential Statements
case <expr> is
when <choice1> =>
if <cond> then
o <= <expr1>;
else
o <= <expr2>;
end if;
when <choice2> =>
...
end case;
Overall structure:
with <expr> select
t <= ... when <choice1>,
... when <choice2>;
Failed attempt:
with <expr> select
t <= -- want to write:
-<val1> when <cond>
-else <val2>
-- but conditional assignment
-- is illegal here
when c1,
...
when c2;
1.5
32
Overview of Processes
Processes are the most difficult VHDL construct to understand. This section gives an overview of
processes. section 1.6 gives the details of the semantics of processes.
Within a process, statements are executed almost sequentially
Among processes, execution is done in parallel
Remember: a process is a concurrent statement!
entity ENTITYID is
interface declarations
end ENTITYID;
architecture ARCHID of ENTITYID is
begin
concurrent statements =
process begin
sequential statements =
end process;
concurrent statements
=
end ARCHID;
33
execution sequence
execution sequence
execution sequence
architecture
procA: process
stmtA1;
A1
stmtA2;
A1
A2
stmtA3;
A1
A2
A3
A2
A3
A3
end process;
procB: process
stmtB1;
stmtB2;
end process;
B1
B1
B2
B1
B2
B2
single threaded:
single threaded:
multithreaded: procA
procA before procB procB before procA and procB in parallel
Figure 1.12: Different process execution sequences
1.5.1
1.5.1
34
Combinational process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Executing the process takes part of one clock cycle
Target signals are outputs of combinational circuitry
A combinational processes must have a sensitivity list
A combinational process must not have any wait statements
A combinational process must not have any rising_edges, or falling_edges
The hardware for a combinational process is just combinational circuitry
Clocked process:
..................................................................... .
Note:
Clocked processes are sometimes called sequential processes,
but this can be easily confused with sequential statements, so in E&CE 327
well refer to synthesizable processes as either combinational or clocked.
35
Example Processes
................................................................... .
Combinational Process
process (a,b,c)
p1 <= a;
if (b = c) then
p2 <= b;
else
p2 <= a;
end if;
end process;
Clocked Processes
process
begin
wait until rising_edge(clk);
b <= a;
end process;
process (clk)
begin
if rising_edge(clk) then
b <= a;
end if;
end process;
1.5.2
Latch Inference
The semantics of VHDL require that if a signal is assigned a value on some passes through a
process and not on other passes, then on a pass through the process when the signal is not assigned
a value, it must maintain its value from the previous pass.
process (a, b, c)
begin
if (a = 1) then
z1 <= b;
z2 <= b;
else
z1 <= c;
end if;
end process;
a
b
c
z1
z2
1.5.2
Latch Inference
36
When a signals value must be stored, VHDL infers a latch or a flip-flop in the hardware to store
the value.
If you want a latch or a flip-flop for the signal, then latch inference is good.
If you want combinational circuitry, then latch inference is bad.
.................................................................... .
a
b
z
a
EN
Latch
Combinational loop
Question:
Answer:
combinational loop
if a = 1 then
z <= b;
else
z <= z;
end if;
latch
if a = 1 then
z <= b;
end if;
flop
if rising edge(a) then
z <= b;
end if;
Flip-flop
37
............................................................ .
1.5.3
1.6
In this section we go through the detailed semantics of how processes execute. These semantics
form the foundation for the simulation and synthesis of VHDL. The semantics define the simulation
behaviour, and the duty of synthesis is to produce hardware that has the same behaviour as the
simulation of the original VHDL code.
1.6.1
Simple Simulation
Throughout the discussion of simulation, we must keep in mind the fundamental observation about
the behaviour of hardware:
Hardware runs in parallel: At each infinitesimally small moment of time, each gate:
1. samples its inputs
2. computes the value of its output
3. drives the output
Before diving into the details of processes, we briefly review gate-level simulation with a simple
example, which we will then explore in excruciating detail through the semantics of VHDL.
With knowledge of just basic gate-level behaviour, we simulate the circuit below with waveforms
for a and b and calculate the behaviour for c, d, and e.
1.6.2
0ns
10ns 12ns
38
15ns
a
a
d
e
d
e
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
There are many different VHDL programs that will synthesize to this circuit. Three examples are:
process (a,b)
begin
c <= a and b;
end process;
process (b,c,d)
begin
d <= not c;
e <= b and d;
end process;
process (a,b,c,d)
begin
c <= a and b;
d <= not c;
e <= b and d;
end process;
process (a,b)
begin
c <= a and b;
end process;
process (c)
begin
d <= not c;
end process;
process (b,d)
begin
e <= b and d;
end process;
The goal of the VHDL semantics is that all of these programs will have the same behaviour.
The two main challenges to make this happen are: a value change on a signal must propagate
instantaneously, and all gates must operate in parallel. We will return to these points in section 1.6.4
1.6.2
There are several different granularities of time to analyze VHDL behaviour. In this course, we
will discuss three major granularities: clock cycles, timing simulation, and delta cycles.
register-transfer-level
smallest unit of time is a clock cycle
combinational logic has zero delay
flip-flops have a delay of one clock cycle
used for simulation early in the design cycle
fastest simulation run times
timing simulation
39
1.6.3
Zero-Delay Simulation
1.6.4
1.6.4
1.6.4.1
40
41
1.6.4.2
1. Simulate a gate if any of its inputs changed. (If no input changed, then the current value of the
output is correct and the output can stay at the same value.)
2. Each gate is simulated at most once per delta cycle.
3. When a gate is executed, the projected (i.e., new) value of the output remains invisible until the
beginning of the next delta cycle.
4. Increment time when there is no need for another delta cycle (no gate had an input change value
in the current delta cycle).
1.6.4
1.6.4.3
42
Back-to-back buffers illustrate how VHDL simulation uses delta-cycles to achieve the illusion that
events propagate instantaneously through combinational circuitry. Without going into the details
of how delta-cycle simulation works, in this example, it takes three delta cycles for the rising edge
on a to propagate through the circuit. Because a delta-cycle is an infinitesimally small amount of
time, in real simulation time (the lower waveform), the rising edges on a, b, and c all appear to
happen at exactly 1 ns.
Delta-cycle simulation
1ns
-cycle
-cycle
-cycle
2ns
b
c
Simple simulation
1ns
c
a
b
c
2ns
43
1.6.4.4
We now extend the the back-to-back buffers to include the projected assignments.
Two copies of each signal:
projected value (not visible)
current value (visible)
1ns
-cycle
-cycle
-cycle
Delta-cycle simulation
with projected values
a
C
D
b
C
D
Simple
simulation
1ns
2ns
a
b
c
In VHDL, the current value of a signal is updated in the delta-cycle after the projected value is
computed. This requires one more delta cycle than in the previous, simplified, example of delta
cycle simulation with back-to-back buffers, which did not include projected assignment.
1.6.4.5
Back-to-back flip-flops illustrate how VHDL uses projected assignments to create the illusion that
gates operate in parallel. Both processes (p b and p c) are sensitive to the clock signal, so both
processes will run in the delta-cycle after the clock changes value. It is this delta cycle that we
focus on. When we execute p b, the process will see a=1 and will compute a new value of
1.6.4
44
1 for b. Because p b and p c must appear to execute in parallel within a delta cycle, the new
value of b=1 must not be visible to p c in this delta cycle.
When p b runs, the new value for b remains invisible until the beginning of the next delta cycle.
Hence, p c will see the old value of b, which is 0. The value of 1 will propagate from b to
c on the next rising edge of the clock, which is not shown.
9ns
10ns
11ns
clk
b
a
clk
9ns
10ns 11ns
a
clk
b
c
This example illustrates gates appearing to operate in parallel, because the two flip-flops appear
to execute at the same time, triggered by the rising edge of the clock. Using the definition of
appearing to execute independently, that the order in which we run the processes within a delta
cycle does not affect the values on the signals at the end of the delta cycle, we can see that regardless
of which order we run the processes, p c sees the old value of b.
45
1.6.4.6
This circuit demonstrates how projected assignment simulates combinational loops correctly.
1ns
-cycle
-cycle
Final value
We begin with a truly parallel simulation of the circuit. This figure uses a thick arrow to denote
when a value is being computed. All of the three gates that are simulated (b, c, and d) sample
their inputs at the same time, then compute their new value in parallel, and drive their result at the
same time. Because the computation is done in parallel, this figure uses a different notation (the
thick arrows) to denote the computation is done.
a
b
1ns
46
............................................. .
-cycle
-cycle
Final value
1.6.4
a
1
b
a
c
1
d
1
Execution order: b, c, d
1ns
-cycle
-cycle
Final value
a
1
b
0
c
1
d
1
Execution order: b, d, c
1ns
.............................................
-cycle
-cycle
Final value
47
a
1
b
b
c
0
d
0
1ns
-cycle
-cycle
Final value
Execution order: b, c, d
a
1
b
b
c
1
d
1
Execution order: b, d, c
Interestingly, even without projected assignment, the final values in each simulation are consistent
with the static functionality of the gates in the circuit (e.g., the output of the AND gate d is 1
1.6.5
48
exactly when both inputs are 1.) This demonstrates that simply checking the final values of a
simulation is not a sufficient technique to determine if the simulation was done correctly.
This circuit, and other combinational loops, such as set-reset latches constructed from crosscoupled NOR gates, also demonstrate some of the difficulties and counter-intuitive behaviour that
arise in comparing simulation results that use different notions of time. By assigning different
delay values to the gates the final values on the gates might be either the same or different from the
zero-delay simulation.
1.6.5
We have already covered the two most important concepts in the delta-cycle simulation algorithm:
delta cycles as an infinitessimally small amount of time and projected assignments where the effect
of an assigment to a signal becomes visible at the beginning of the next delta cycle. This section
flushes out these concepts by connecting them to the syntax of VHDL and filling in some details.
The algorithm presented here is a simplification of the actual algorithm in Section 12.6 of the
VHDL Standard. The two most significant simplifications are that this algorithm does not support
delayed assignments or resolution.
To support delayed assignments, each signals projected value is generalized to a projected waveform (more precisely, VHDLs term is projected output waveform), which is a list containing the
values and times for multiple projected assignments in the future.
Resolution allows multiple processes to write to the same signal. This is usally a mistake, but
it is useful for tri-state busses, where all but one of the processes write a value of Z, and the
one process that has been granted permission to write to bus writes a 1 or a 0. To support
resolution, each signals projected value would become a set of values, where each value represents
the value written by one process. At the end of the simulation cycle, the set of values are resolved
into a single final value. The values of 1 and Z resolve to 1. Similarly, 0 and Z
resolve to 0. However, 1 and 0 resolve to X, indicating that the processes are driving
conflicting values.
In our presentation, we begin with an informal description of the delta-cycle simulation algorithm
and illustrate the algorithm with the back-to-back buffer example. We then give the definitions and
a somewhat more formal presentation of the delta-cycle simulation algorithm and do the back-toback flip flops and combinational loop examples.
1.6.5.1
49
1.6.5
1.6.5.2
50
We repeat the back-to-back buffer example from section 1.6.4.4, but now add the time scales
(simulation rounds, simulation cycles, and simulation steps) and process modes, to make it a fully
detailed delta-cycle simulation.
proc_a : process begin
a <= 0;
wait for 1 ns;
a <= 1;
wait;
end process;
a
values
old new
graphical
symbol
text
U 0
U U
0 1
0ns
1ns
R
R
R
R E
E
b
U
U
U
R E
2ns
R E
R E
S
S
R E
51
1.6.5.3
..........................................................
Definition simulation cycle: The operations that occur in one iteration of the simulation
algorithm.
Definition delta cycle: A simulation cycle that does not advance simulation time.
Equivalently: A simulation cycle with zero-delay assignments where the assignment
causes a process to resume.
Definition simulation round: A sequence of simulation cycles that all have the same
simulation time.
Note: Official and unofficial terminology
Simulation cycle and delta cycle
are official definitions in the VHDL Standard. Simulation step and simulation
round are not standard definitions. We use them because we need words to
associate with the concepts that they describe.
1.6.5
52
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
( initialization )
set all signals to default value;
add to resume set all processes;
set time to 0 ns;
( simulation loop )
while time < {
( begin simulation round )
add to resume set all processes that are waiting for current time;
while time does not change {
( begin simulation cycle )
copy projected values of signals to current values;
add to resume set any process that:
is sensitive to a signal that changed value
or whose wait-condition became true;
execute all processes in resume set;
( assign to projected values of signals )
( execute until suspend on a wait statement or sensitivity list )
clear resume set; ( resume set =
6 )
if none of the executing processes performed a signal assignment then {
increment time to the minimum of the wait times for processes;
}
}
}
1.6.5.4
We now do a full delta-cycle simulation of back-to-back flip flops. This expands on the simpler
simulation we did in section 1.6.4.5, where we did delta-cycles and projected assignments informally, but did not include the time scales or process modes.
53
proc_a : process
begin
a <= 0;
wait for 9 ns;
a <= 1;
wait;
end process;
proc_clk : process
begin
clk <= 0;
wait for 10 ns;
clk <= 1;
wait for 10 ns;
end process;
b
a
clk
proc_flops : process
begin
wait until rising_edge(clk);
b <= a;
c <= b;
end process;
re=rising edge
Time
Sim rounds
Sim cycles
proc_a
proc_clk
proc_flops
0ns
9ns
10ns
R
R
R
R E
20ns
30ns
S
R E
E S
R E
R E
U
a
U
U
clk
U
b
U
c
U
...........................................
We now repeat the back-to-back flip flops example, but compare if-rising-edge based code to
1.6.5
54
wait until rising-edge based code. The values on the flip-flops is the same. But, because the
if-rising-edge process has the clock in its sensitivity list, this process executes whenever there
is a change on the clock signal. In comparison, the wait until rising-edge process executes only
when there is a rising edge on the clock. When the if-rising-edge process executes after a falling
edge of the clock, the process suspends without executing any signal assignments, because the
assignments are within the then clause of the if-rising-edge.
proc_flops1 : process
begin
wait until rising_edge(clk);
b1 <= a;
c1 <= b1;
end process;
Time
Sim rounds
Sim cycles
proc_a
proc_clk
proc_flops1
proc_flops2
10ns
20ns
R E
S
E
clk
U
b1
U
U
c1
U
U
b2
U
U
c2
U
R E
R E
R
30ns
R E S
55
1.6.5.5
We now do a full VHDL delta-cycle simulation of the combinational loop example that we did
informally in section 1.6.4.6. Notice that the last simulation cycles at 0 ns and 1 ns do not contain
any signal assignments or process mode changes. These simulation cycles are needed, because the
VHDL simulation semantics require a simulation cycle to follow any simulation cycle in which a
projected assignment occurs, even if the projected assignment did not change the projected value
of the signal.
proc_a : process begin
a <= 0;
wait for 1 ns;
a <= 1;
wait;
end process;
a
b
1.6.5
Time
Sim rounds
Sim cycles
proc_a
proc_b
proc_c
proc_d
56
0ns
1ns
R
R
R
R
S
E
S
E
S
E
R E
R
R
S
E
S
E
R E
R
S
E
U
a
U
b
U
c
U
d
U
These assignments cause this simulation cycle.
1.6.5.6
The VHDL Language Reference Manual gives only a textual description of the VHDL semantics.
The conventions for drawing the waveforms are just our own.
Each column is a simulation step.
In a simulation step, either exactly one process changes mode or exactly one signal changes
value, except in the first two simulation steps of each simulation cycle, when multiple current
values may be updated and multiple processes may resume.
If a projected assignment assigns the same value as the signals current projected value, the
assignment must still be shown, because this assignment will force another simulation cycle in
the current simulation round.
If a signals current value is updated with the same value as it currently has, this assignment is
not shown, because it will not trigger any sensitivity lists.
Assignments to signals may be denoted by either the number/letter of the new value or one of
the edge symbols:
old value
new value
U 0 1
U
0
1
57
Some observations about delta-cycle simulation waveforms that can be helpful in checking that a
simulation is correct:
In the first simulation step of the first simulation cycle of a simulation round (i.e., the first
simulation step of a simulation round), at least one process will resume. This is contrast to the
first simulation step of all other simulation cycle, where current values of signals are updated
with projected values.
At the end of a simulation cycle all processes are suspended.
In the last simulation cycle of a simulation round either no signals change value, or any signal
that changes value is not in the sensitivity list of any process.
1.6.6
In our work so far with delta-cycle simulation, we have worked through the mechanics of simulation. This example applies knowledge of delta-cycle simulation at a conceptual level. We could
answer the question by thinking about the semantics of delta-cycle simulation or by mechanicaly
doing the simulation.
Question:
Do the signals b1 and b2 have the same behaviour from 1020 ns?
1.6.6
Answer:
The signals b1 and b2 will have the same behaviour if a1 and a2 have the
same behaviour. The difference in the code between a1 and a2 is that a1 is
waiting for 10ns and a2 is waiting until a rising edge of the clock. There is a
rising edge of the clock at 10ns, so we might be tempted to conclude
(incorrectly) that both a1 and a2 transition from U to 1 at exactly 10ns and
therefore have exactly the same behaviour.
The difference between the behaviour of a1 and a2 is that in the first
simulation cycle for 10ns, the process for a1 resumes, while the process for
a2 resumes only after the rising edge of clock.
The signal a1 is waiting for 10ns, so in the first simulation cycle for 10ns, the
process for a1 resumes and executes. Also in the first simulation cycle for
10ns, the clock toggles from 0 to 1. This rising edge causes the processes for
a2, b1, and b2 to resume and execute in the second simulation cycle.
In the second simulation cycle for 10ns:
a2 changes from U to 1.
b1 sees the value of 1 for a1, because a1 became 1 in the first simulation
cycle.
b2 sees the old value of U for a2, because the process for a2 did not
execute in the first simulation cycle.
58
59
Time
Sim rounds
Sim cycles
proc_clk
proc_a1
proc_a2
proc_b
10ns
20ns
R E
R
S
E
S
R E
R
S
E
clk
U
a1
U
U
a2
U
U
b1
U
U
b2
U
1.7
60
Register-Transfer-Level Simulation
Delta-cycle simulation is very tedious for both humans and computers. For many circuits, the
complexity of delta-cycle simulation is not needed and register-transfer-level simulation, which is
much simpler, can be used instead.
The major complexities of delta-cycle simulation come from running a process multiple times
within a single simulation round and keeping track of the modes of the proceses. Register-transferlevel simulation avoids both of these complexities. By evaluating each signal only once per simulation round, an entire simulation round can be reduced to a single column in a timing diagram.
The disadvantage of register-transfer-level simulation is that it does not work for all VHDL programs in particular, it does not support combinational loops.
1.7.1
Overview
In delta-cycle simulations, we often simulated the same process multiple times within the same
simulation round. In looking at the circuit though, we mentally can calculate the output value
by evaluating each gate only once per simulation round. For both humans and computers (or the
humans waiting for results from computers), it is desirable to avoid the wasted work of simulating
a gate when the output will remain at U or will change again later in the same simulation round.
In register-transfer-level simulation, we evaluate each gate only once per simulation round. Registertransfer-level simulation is simpler and faster than delta-cycle simuation, because it avoids delta
cycles and provisional assignments.
In delta-cycle simulation, we evaluate a gate multiple times in a single simulation round if the
process that drives the gate is active in multiple simulation cycles, which happens when the process
is triggered in multiple simulation cycles. To avoid this, we must evaluate a signal only after all of
the signals that it depends on have stable values, that is, the signals will not change value later in
the simulation round.
A combinational loop is a circuit that contains a cyclic path through the circuit that includes only
combinational gates. Combinational loops can cause signals to oscillate, which in delta-cycle
simulation with zero-delay assignments, corresponds to an infinite sequence of delta cycles. We
immediately see that when doing zero-delay simulation of a combinational loop such as
a <= not(a);, the change on a will trigger the process to re-run and re-evaluate a an infinite
number of times. Hence, register-transfer-level simulation does not support combinational loops.
To make register-transfer simulation work, we preprocess the VHDL program and transform it so
that each process is dependent upon only those processes that appear before it. This dependency
ordering is called topological ordering. If a circuit has combinational loops, we cannot sort the
processes into a topological order.
The register-transfer level is a coarser level of temporal abstraction than the delta-cycle level.
In delta-cycle simulation, many delta-cycles can elapse without an increment in real time (e.g.
61
nanoseconds). In register-transfer-level simulation, all of the events that take place in the same
moment of real time take place at same moment in the simulation. In other words, all of the events
that take place at the same time are drawn in the same column of the waveform diagram.
Register-transfer-level simulation can be done for legal VHDL code, either synthesizable or unsynthesizable, so long as the code does not contain combinational loops. For any piece of VHDL code
without combinational loops, the register-transfer-level simulation and the delta-cycle simulation
will have same value for each signal at the end of each simulation round.
By sorting the processes in topological order, when we execute a process, all of the signals that the
process depends on will have already been evaluated, and so we know that we are reading the final,
stable values that each signal will have for that moment in time. This is good, because for most
processes, we want to read the most recent values of signals. The exceptions are timed processes
that are dependent upon other timed processes running at the same moment in time and clocked
processes that are dependent upon other clocked processes.
process begin
a <= 0;
wait for 10 ns;
a <= 1;
...
end process;
process begin
b <= 0;
wait for 10 ns;
b <= a;
...
end process;
Answer:
Both processes will execute in
the same simulation cycle at 10
ns. The statement b <= a will
see the value of a from the
previous simulation cycle, which
is before a <= 1; is
evaluated. The signal b will be
0 at 10 ns.
As the above example illustrates, if a clocked process reads the values of signals from processes
that resume at the same time, it must read the previous value of those signals. Similarly, if a
clocked process reads the values of signals from processes that are sensitive to the same clock,
those processes will all resume in the same simulation cycle the cycle immediately after the
rising-edge of the clock (assuming that the processes use if rising edge or wait until
rising edge statements). Because the processes run in the same simulation cycle, they all read
the previous values of the signals that they depend on. If this were not the case, then the VHDL
code for pair of back-to-back flip flops would not operate correctly, because the output of the first
flip-flop would appear immediately at the output of the second flip-flop.
Simulation rounds begin with incrementing time, which triggers timed processes. Therefore, the
first processes in the topological order are the timed processes. Timed processes may be run in any
order, and they read the previous values of signals that they depend on. This gives the same effect
1.7.2
62
as in delta-cycle simulation, where the timed processes would run in the same simulation cycle and
read the values that signals had before the simulation cycle began.
We then sort the clocked and combinational processes based on their dependencies, so that each
process appears (is run) after all of the processes on which it depends.
Although a clocked process may read many signals, we say that a clocked process is dependent
upon only its clock signal. It is the change in the clock signal that causes the process to resume.
So, as long as the process is run after the clock signal is stable, we can be sure that it will not need
to be run again at this time step. Clocked processes may be run in any order. They read the current
value of their clock signal and the previous value of the other signals that they depend on. As
with timed processes, this gives the same effect as in delta-cycle simulation, where the clock edge
would trigger the clocked processes to run in the same simulation cycle and the processes would
read the values that signals had before the simulation cycle began.
1.7.2
1. Pre-processing
(a) Separate processes into timed, clocked, and combinational
(b) Decompose each combinational process into separate processes with one target signal
per process
(c) Sort combinational processes into topological order based on dependencies
2. For each moment of real time:
(a) Run timed processes in any order, reading old values of signals.
(b) Run clocked processes in any order, reading new values of timed signals and old values
of registered signals.
(c) Run combinational processes in topological order, reading new values of signals.
63
................................................ .
proc(a,b,c)
if a = 1 then
d <= b;
e <= c;
else
d <= not b;
e <= b and c;
end if;
end process;
Original code
1.7.3
1.7.3.1
proc(a,b,c)
if a = 1 then
d <= b;
else
d <= not b;
end if;
end process;
proc(a,b,c)
if a = 1 then
e <= c;
else
e <= b and c;
end if;
end process;
After decomposition
We revisit an earlier example from delta-cycle simulation, but change the code slightly and do
register-transfer-level simulation.
1. Original code:
proc1: process (a, b, c) begin
d <= NOT c;
c <= a AND b;
end process;
proc2: process (b, d) begin
e <= b AND d;
end process;
1.7.3
64
Decomposed
Sorted
3. To sort combinational processes into topological order, move proc1d after proc1c, because d depends on c.
4. Run timed process (proc3) until suspend at wait for 3 ns;.
The signal a gets 1 from 0 to 3 ns.
The signal b gets 0 from 0 to 3 ns.
5. Run proc1c
The signal c gets a AND b (1 AND 0 = 0) from 0 to 3 ns.
6. Run proc1d
The signal d gets NOT c (NOT 0 = 1) from 0 to 3 ns.
7. Run proc2
The signal e gets b AND d (0 AND 1 = 0) from 0 to 3 ns.
8. Run the timed process until suspend at wait for 99 ns;, which takes us from 3ns to
102ns.
9. Run combinational processes in topological order to calculate values on c, d, e from 3ns to
102ns.
Waveforms
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
0ns
1ns
2ns
3ns
U 1
U 0
U 0
U 1
U 0
102ns
65
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
huey: process
begin
clk <= 0;
wait for 10 ns;
10 30 50 70 90 110
clk <= 1;
wait for 10 ns;
20 40 60 80 100
end process;
10
30 50 90
70 110
dewey: process
begin
a <= to_unsigned(0,4);
wait until re(clk);
110
while (a < 4) loop
a <= a + 1;
wait until re(clk);
end loop;
10
30 50 70 90
louie: process
begin
d <= 1;
wait until re(clk);
if (a >= 2) then
d <= 0;
wait until re(clk);
end if;
end process;
end process;
I 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
110
clk
U 0
U 0
U 1
120
A Related Simulation
66
..................................................................
Small changes to the code can cause significant changes to the behaviour.
riri: process
begin
clk <= 1;
wait for 10 ns;
clk <= 0;
wait for 10 ns;
end process;
loulou: process
begin
wait until re(clk);
d <= 1;
if (a < 2) then
d <= 0;
wait until re(clk);
end if;
fifi: process
begin
a <= to_unsigned(0,4);
wait until re(clk);
while (a < 4) loop
a <= a + 1;
wait until re(clk);
end loop;
end process;
end process;
I 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
110
120
clk
a
d
1.8
This section describes how we can use a software programming language to write cycle-accurate
models of hardware systems. We use Python as our example language, but any imperative language
may be used.
1.8.1
Introductory Examples
67
Cyclic Dependencies
.......................................................... .
#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
#----------------print( "%s" % (t a, b) )
#----------------a = a + b
b = b + 1
#-----------------------
...................................................................
We add a cyclic dependency between the registers a and b. Remember that registers must execute in parallel and that VHDL achieves the illusion of parallel execution through projected assignments. We mimic projected assignments in software by introducing a next copy of each
variable.
#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
#----------------print( "%s" % (t, a, b) )
#----------------# projected assignments to next
a_next = a + b
b_next = a - b
#----------------# update current from next
a = a_next
b = b_next
#-----------------
1.8.2
We explore several alternative coding styles to support circuits with both combinational and registered hardware with cyclic depencencies among the registers.
1.8.2
68
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
a
d
1
b
c
2
#----------------------a = 1
b = 2
c = 2 * b
d = c + 1
#----------------------for t in range( 20 ) :
#----------------# execute registers
a_next = a + d
b_next = a - c
#----------------# drive registers
a = a_next
b = b_next
#----------------# execute comb
c = 2 * b
d = c + 1
........................................................... .
We improve upon the previous version of the code by eliminating the need to initialize the combinational variables. We recognize that in the steady-state of the simulation run, the instructions in
the simulation loop just execute one after the other, and it does not matter which instruction is at
the top of the loop. We eliminate the combinational initialization code by rotating the instructions
in the loop such that the combinational datapath instructions are at the top of the loop. We can then
delete the combinational initialization without affecting the behaviour of the system.
69
a
d
1
b
c
2
#----------------------a = 1
b = 2
#----------------------for t in range( 20 ) :
#----------------# execute comb
c = 2 * b
d = c + 1
#----------------# execute registers
a_next = a + d
b_next = a - c
#----------------# drive registers
a = a_next
b = b_next
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
As an alternative to what we have seen so far, here we follow the style of our RTL simulation
algorithm, where registered variables read the old values of other registered variables. We add
variables that keep track of the old, or previous, values of registered variables. When we read a
variable, we read its old value and then update the current values with the old values at the end
of the clock cycle. This approach has some disadvantages. We show it to illustrate the duality
between next and prev values of variables: two mechanisms to achieve the same behaviour.
This approach has two disadvantages:
We must initialize combinational variables.
When reading variables, we must distinguish between registered and combinational variables.
It is generally preferable to distinguish between registered and combinational variables when writing to a variable rather than when reading it. Two main reasons are:
For realistic-size circuits, we generally read a variable more times than we assign to it, so it
requires less typing and is less susceptible to mistakes.
In hardware, the distinction between combinational and registered signals is made in the circuitry
that drives the signal, not the other gates that read the signal.
1.8.3
a
d
1
b
c
2
1.8.3
Inputs
70
Inputs
We model inputs by loading values into an array, then reading the values one-by-one from the
array.
i_c
b
71
1.8.4
Pipeline
Pipelines are a special style of system that allows us to dispense with the next version of variables. By assigning values to registers in reverse topological order (from the end of the pipeline
back to the front), we can assign values directly to the current-value variables.
We first show a model for the pipeline written using next-variables. The assignments are done in
reverse order, from back to front. This is possible because the use of the next-variables ensures
that there are not any dependencies between these assignments, and so all orders of execution will
produce the same results.
With this order of assignments to the next-variables, there is no dependency between the assignent
to a next-variable in execution and the driving of the current variable. For example, we read from
c in the line before we write to c next, and c is not read until we drive it from c next. Thus,
we can remove c next and the executing assignment can drive c directly.
We can see that this technique is correct also by remembering rules for register-transfer-level simulation: registered signals read the old values of registered signals and combinational signals read
the new values of all signals. By doing the registered assignments in reverse order, each variable
sees the old values of the other registers.
a
G
b
H
c
I
d
1.8.4
#----------------------# initialization
b = 0
c = 0
d = 0
#----------------------for t in range( 20 ) :
#----------------print( "%s" % ... )
#----------------# execute regs
d_next = H( c )
c_next = G( b )
b_next = F( a[t] )
#----------------# drive regs
b = b_next
c = c_next
d = d_next
#----------------# execute comb
e = I( d )
Pipeline
72
#----------------------# initialization
b = 0
c = 0
d = 0
#----------------------for t in range( 20 ) :
#----------------print( "%s" % ... )
#----------------# execute and drive
b = F( a[t] )
c = G( b )
d = H( c )
e = I( d )
.................................................................. .
When we introduce combinational variables (e.g., c below) into the design, the order of assignments becomes a bit more complicated, but is still systematic. We do the registers in reverse-order,
then within each stage (between adjacent registers), we do the combinational signals in topological
order.
We need to initialize the registers, but do not need to initialize the combinational variables, because
the combinational variables are executed before they are read.
73
I
e
................................................................
When we introduce feedback into a pipeline (e.g., from c to a below), we no longer can rely solely
on performing the registered assignments in reverse topological order.
The variable that is fed back to an earlier stage must use a projected assignment to break the
feedback loop. All other variables are unaffected.
G
a
1.9
H
b
I
c
#----------------------for t in range( 20 ) :
#----------------print( "%s" % (t, ...) )
#----------------d
= I( c )
c_next = H( b )
b
= G( a )
a
= F( c )
#----------------c = c_next
Variables in VHDL
This is an advanced section.
It is not covered in the course
and will not be tested.
Variables in VHDL have the same semantics as variables in a software language. Variables may
be declared inside processes, functions, and procedures. Variables should not be declared inside
architectures. For a variable to be declared in an architecture, it must be a shared variable, and
shared variables are not synthesizable.
1.9.1
1.9.1
Semantics
74
Semantics
Variables are updated immediately. More precisely, in contrast to signals, variables have only a
current value, not a separate projected value and current value. The value of a variable is visible
(driven) in the same simulation cycle, immediately after the variable assignment statement is
executed. This behaviour causes variables to act like combinational hardware.
Variables hold their value until they are assigned a new value. In this respect, variables act like
registers or latches.
1.9.2
Usage of Variables
The inconsistent behaviour in variables acting like both combinational hardware and registers/latches
makes variables potentially risky to use in code that is intended to be synthesized.
Difficult to predict what hardware will be synthesized.
May get quite different hardware from different tools.
Easy to write code that is synthesizable by some tools and not by others.
Any behaviour and circuit that can be modeled using variables can be modeled using only signals.
Variables are never necessary; they are only a convenience to be exploited when using signals
would be cumbersome.
Recommendation: use variables only when you need combinational hardware inside a clocked
process.
The example below illustrates the acceptable use of a variable, and an equivalent circuit using only
signals.
process
variable v : std_logic;
begin
wait until rising_edge(clk);
r1 <= a;
r2 <= b;
v := r1 xor r2;
r3 <= not v;
end process;
Intermediate variable
Intermediate signal
The dual combinational/registered nature of variables can be seen in the program below, where
the variable v is synthesized into two separate pieces of hardware, one combinational and one
registered.
75
process
variable v : std_logic;
begin
wait until rising_edge(clk);
if a = 1 then
v := b;
else
v := c;
wait until rising_edge(clk);
end if;
z <= v;
end process;
1.10
Assignments with delays (e.g., b <= a after 2 ns;) are used to model delays through gates
and wires in circuits. Simulation with delays is often called timing simulation, because the simulation captures both values and the timing of the circuit.
1.10.1
Transport delay models the time it takes for an edge or value to propagate along the gates and
wires between the signals that are read and the target signal.
Inertial delay models the phenomenon that physical devices have inertia and cannot switch instantaneously from one value to another. Glitches or pulses that are shorter in duration than the
inertial delay are deleted.
transport delay
inertial delay
a
assigned value of 1
rejection window
Existing values of a before rejection window
are unaffected by assignment.
Current time when
assignment is executed.
1.10.2
76
The rejection window is the period of time before the new value arrives where old values will be
deleted if they would result in a glitch or pulse. The difference between the rejection window and
the inertial delay is a delay value that is relative to the transport delay, while the rejection window
is absolute window of time (start time and stop time):
T p = transport delay
T i = inertial delay
T r = rejection window
T r.begin = T p T i
T r.end = T p
A sample assignment with the value 1 showing a transport delay of 10 ns and an inertial delay
of 3 ns:
10ns
The default value for inertial delay is 0 ns. The two statements below are equivalent:
b <= 1 after 10 ns;
b <= reject 0 ns inertial 1 after 10 ns;
10ns
The keyword reject may be ommitted if the inertial delay is equal to the transport delay. The
two statements below are equivalent:
b <= inertial 1 after 10 ns;
b <= reject 10 ns inertial 1 after 10 ns;
1.10.2
10ns
The use of delayed assignments requires us to extend the notion of projected values (section 1.6) to
projected waveforms. The VHDL Language Reference Manual uses the phrase projected output
waveform, but for simplicity we use just projected waveform. Each signal has a projected
waveform, which describes the delayed assignments that are projected to happen in the future.
More precisely, a waveform is a sequence of transactions, and a transaction is a (value, time) pair.
When a signal assignment is executed, some of the targets existing transactions may be deleted,
according to the rules below:
77
Existing transactions that occur during the rejection window are preserved if they:
have the same value as the first new transaction
and are not followed by existing transactions with different values.
Simple Examples
......................................................................
The figure below shows some simple examples of how the target signals projected waveform (the
lhs) is updated by the expression on the right hand side of the assignment.
lhs
rhs
lhs
rhs
res
res
lhs
rhs
lhs
rhs
res
res
lhs
rhs
lhs
rhs
res
res
lhs
rhs
lhs
rhs
res
res
lhs
rhs
lhs
rhs
res
res
lhs
rhs
lhs
rhs
res
res
1.10.2
Complex Examples
78
....................................................................
Most of the complexities in understanding the rules for projected waveforms stem from issues of
which transactions are deleted from the targets existing projected waveform.
lhs
lhs
rhs
rhs
res
res
lhs
lhs
rhs
rhs
res
res
lhs
lhs
rhs
rhs
res
res
lhs
lhs
rhs
rhs
res
res
lhs
lhs
rhs
rhs
res
res
...................................................... .
Signals in the right-hand-side expression are evaluated at the time that the assignment is executed.
79
Current value
Current value
Projected waveform
Current time when assignment is executed.
1.10.3
Simulation Examples
In delta-cycle simulation, when we increment time, we need to check both for processes that need
to resume (as before), and signals that need to update their current value.
As we execute delayed assignments, the projected waveform of the target signals evolve with the
addition and deletion of transactions.
Transport Delay 1
.....................................................................
This example illustrates simulation with a transport delay. The signal a and b do not have delayed
assignments, so we simulate these signals exactly as we have done before. For the projected value
of c, we need to keep track of a sequence of projected values that will occur in the future, and so
we need to keep track of multipe (value,time) pairs.
process begin
a <= 0;
b <= 0;
wait for 10 ns;
a <= 1;
wait for 4 ns;
b <= 1;
wait;
end process;
1.10.3
Time
Sim rounds
Sim cycles
proc_1
proc_2
Simulation Examples
0ns
5 10ns
14ns
R E
R
R E
S
R E
15ns
S
E
80
R E
R E
19ns
S
R E
U
a
U
U
b
U
U
(U,5)
(U,5)(0,5)
(1,15)
(1,15)(0,19)
(0,19)
c
U
0 ns+1 This is the second assignment of a value to c at 5 ns, and so this assignent overwrites
the previous one.
5 ns The projected value of 0 for c is copied to the visible value. This is an unusual simulation
round, because no processes executed.
14 ns+1 We have two transactions in our projected waveform for c: a value of 1 at 15 ns and
a value of 0 at 19 ns.
15 ns We update the current value of c from its projected waveform and so delete the transaction
that was scheduled for 15 ns.
19 ns Similar to at 15 ns, we update the current value of c and delete the corresponding transaction.
Transport Delay 2
process begin
a <= 0;
b <= 1;
c <= 1;
wait for 5 ns;
c <= 0;
wait;
end process;
.....................................................................
process (a,b,c) begin
if c = 1 then
d <= a after 9 ns;
else
d <= b after 3 ns;
end if;
end process;
81
Time
Sim rounds
Sim cycles
proc_1
proc_2
0ns
3ns
5ns
8ns
R E
R
R E
E
R E
S
R E
U
a
U
U
b
U
U
c
U
U
(U,3)
(U,3)(0,9) (0,9)
(1,8)(0,9)
d
U
5 ns+1 We execute d <= b after 3 ns. The existing transaction for d is (0,9 ns).
The new transaction is (1, 8 ns). Because the existing transaction is projected to
occur after the new transaction, the existing transaction is deleted.
.........................................................
Including an inertial delay does not affect the behaviour, so long as there are no pulses whose width
is less than the inertial delay. In this example, c has 1 pulse that is 4 ns long and an inertial
delay of 3 ns.
process begin
a <= 0;
b <= 0;
wait for 10 ns;
a <= 1;
wait for 4 ns;
b <= 1;
wait;
end process;
1.10.3
Time
Sim rounds
Sim cycles
proc_1
proc_2
Simulation Examples
0ns
5 10ns
14ns
R E
R
R E
S
R E
15ns
S
E
82
R E
R E
19ns
R E
U
a
U
U
b
U
U
(U,5)
(U,5)(0,5)
(1,15)
(1,15)(0,19)
(0,19)
c
U
.........................................................
In this second example of transport and inertial delay, there is only 1 ns between a change on a at
10 ns and a change on b at 11 ns, which would result in a 1 ns pulse on a. But, the inertial delay of
3 ns cancels out this pulse at 11 ns+1 when the assignment to b becomes visible.
process (a,b) begin
c <= reject 3 ns inertial a xor b after 5 ns;
end process;
process begin
a <= 0;
b <= 0;
wait for 10 ns;
a <= 1;
wait for 1 ns;
b <= 1;
wait;
end process;
Time
Sim rounds
Sim cycles
proc_1
proc_2
0ns
R E
R
5 10ns
R E
E
R E
11ns
R E
R E
16ns
S
R E
U
a
U
U
b
U
U
c
U
(U,5)
(U,5)(0,5)
(1,15)
(1,15)(0,16)
83
1.10.4
Waveform Expressions
rhs
Waveform expression
res
t+5ns
lhs
rhs
Waveform expression
res
The delays between transactions in the waveform expression do not need to be consistent with the
inertial delay, in that the delays between transactions may be less than the inertial delay:
a <= reject 3 ns inertial 0 after 5 ns, 1 after 7 ns;
t+2ns
t+5ns t+7ns
lhs
rhs
Waveform expression
res
1.11
84
This section outlines the building blocks for register transfer level design and how to write VHDL
code for the building blocks.
1.11.1
CE
WE
A
DI
WE
DO
A0
DO0
DI0
A1
DO1
Hardware
VHDL
AND, OR, NAND, NOR, XOR, and, or, nand, nor, xor, xnor
XNOR
multiplexer
if-then-else, case statement,
selected assignment, conditional assignment
adder, subtracter, negater
+, -, shifter, rotater
sll, srl, sla, sra, rol, ror
flip-flop
wait until,
if-then-else,
rising edge
memory array, register file, queue
2-d array or library component
Figure 1.15: RTL Building Blocks
85
1.11.2
Some of the common gates you have encountered in previous courses should be avoided when
synthesizing register-transfer-level hardware, particularly if FPGAs are the implementation technology.
1.11.2.1
flip-flop Edge sensitive: output only changes on rising (or falling) edge of clock
latch Level sensitive: output changes whenever clock is high (or low)
A common implementation of a flip-flop is a pair of latches (Master/Slave flop).
Latches are sometimes called transparent latches, because they are transparent (input directly
connected to output) when the clock is high.
The clock to a latch is sometimes called the enable line.
There is more information in the course notes on timing analysis for storage devices (section 8.3).
1.11.2.2
Deprecated Hardware
Latches
Use flops, not latches
Latch-based designs are susceptible to timing problems
The transparent phase of a latch can let a signal leak through a latch causing the
signal to affect the output one clock cycle too early
Its possible for a latch-based circuit to simulate correctly, but not work in real hardware,
because the timing delays on the real hardware dont match those predicted in synthesis
T, JK, SR, etc flip-flops
Limit yourself to D-type flip-flops
Some FPGA and ASIC cell libraries include only D-type flip flops. Others, such as Alteras APEX FPGAs, can be configured as D, T, JK, or SR flip-flops.
Tri-State Buffers
Use multiplexers, not tri-state buffers
Tri-state designs are susceptible to stability and signal integrity problems
Getting tri-state designs to simulate correctly is difficult, some library components dont
support tri-state signals
Tri-state designs rely on the code never letting two signals drive the bus at the same time
It can be difficult to check that bus arbitration will always work correctly
Manufacturing and environmental variablity can make real hardware not work correctly
even if it simulates correctly
1.11.3
86
Typical industrial practice is to avoid use of tri-state signals on a chip, but allow tri-state
signals at the board level
Note:
Unfortunately and surprisingly, PalmChip has been awarded a
US patent for using uni-directional busses (i.e. multiplexers) for systemon-chip designs. The patent was filed in 2000, so all fourth-year design
projects since 2000 that use muxes on FPGAs will need to pay royalties to
PalmChip
1.11.3
1.11.3.1
1.11.3.2
process
begin
wait until rising_edge(clk);
q <= d;
end process;
The two code fragments below synthesize to identical hardware (flops with synchronous reset).
Notice that the synchronous reset is really nothing more than an AND gate on the input.
If
process (clk)
begin
if rising_edge(clk) then
if (reset = 1) then
q <= 0;
else
q <= d;
end if;
end if;
end process;
Wait
process
begin
wait until rising_edge(clk);
if reset = 1 then
q <= 0;
else
q <= d;
end if;
end process;
87
1.11.3.3
The two code fragments below synthesize to identical hardware (flops with chip-enable lines).
If
process (clk)
begin
if rising_edge(clk) then
if ce = 1 then
q <= d;
end if;
end if;
end process;
1.11.3.4
Wait
process
begin
wait until rising_edge(clk);
if ce = 1 then
q <= d;
end if;
end process;
The two code fragments below synthesize to identical hardware (flops with chip-enable lines and
muxes on inputs).
If
process (clk)
begin
if rising_edge(clk) then
if ce = 1 then
if sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end if;
end process;
Wait
process
begin
wait until rising_edge(clk);
if ce = 1 then
if sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end process;
1.11.4
1.11.3.5
88
The two code fragments below synthesize to identical hardware (flops with chip-enable lines,
muxes on inputs, and synchronous reset). Notice that the synchronous reset is really nothing
more than a mux, or an AND gate on the input.
Note:
The specific combination and order of tests is important to guarantee
that the circuit synthesizes to a flop with a chip enable, as opposed to a levelsensitive latch testing the chip enable and/or reset followed by a flop.
Note:
The chip-enable pin on the flop is connected to both ce and reset.
If the chip-enable pin was not connected to reset, then the flop would ignore
reset unless chip-enable was asserted.
If
process (clk)
begin
if rising_edge(clk) then
if ce = 1 or reset =1 then
if reset = 1 then
q <= 0;
elsif sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end if;
end process;
1.11.4
Wait
process
begin
wait until rising_edge(clk);
if ce = 1 or reset = 1 then
if reset = 1 then
q <= 0;
elsif sel = 1 then
q <= d1;
else
q <= d0;
end if;
end if;
end process;
There are many ways to write VHDL code that synthesizes to the schematic in figure ??. The
major choices are:
1. Categories of signals
(a) All signals are outputs of flip-flops or inputs (no combinational signals)
(b) Signals include both flopped and combinational
2. Number of flopped signals per process
(a) All flopped signals in a single process
(b) Some processes with multiple flopped signals
(c) Each flopped signal in its own process
3. Style of flop code
(a) Flops use if statements
89
sel reset
clk
entity and_not_reg is
port (
reset,
clk,
sel
: in std_logic;
c
: out std_logic
c
);
end;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Figure 1.17: Implementation of Figure ??: all signals are flops, all flops in one process, flops use waits
1.11.4
90
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Figure 1.18: Implementation of Figure ??: all signals are flops, one flop per process, flops use waits
91
.......................................................
Figure 1.19: Implementation of Figure ??: all signals are flops, one flop per process, flops use if-then-else
Concurrent Statements
92
................................................................
Figure 1.20: Implementation of Figure ??: flopped and combinational signals, one flop per process, flops use if-then-else
1.12
For us to consider a VHDL progam synthesizable, all of the conditions below must be satisfied:
the program must be theoretically implementable in hardware
the hardware that is produced must be consistent with the structure of the source code
the source code must be portable across a wide range of synthesis tools, in that the synthesis
tools all produce correct hardware
Synthesis is done by matching VHDL code against templates or patterns. Its important to use
idioms that your synthesis tools recognizes. If you arent careful, you could write code that has
the same behaviour as one of the idioms, but which results in inefficient or incorrect hardware.
section 1.11 described common idioms and the resulting hardware.
Most synthesis tools agree on a large set of idioms, and will reliably generate hardware for these
idioms. This section is based on the idioms that Synopsys, Xilinx, Altera, and Mentor Graphics
are able to synthesize.
93
1.12.1
Initial Values
Reason: In most implementation technologies, when a circuit powers up, the values on signals
are completely random. Some FPGAs are an exception to this. For some FPGAs, when a chip is
powered up, all flip flops will be 0. For other FPGAs, the initial values can be programmed.
1.12.2
Wait For
Reason: Delays through circuits are dependent upon both the circuit and its operating environment,
particularly supply voltage and temperature.
1.12.3
Variables
process
variable bad : std_logic;
begin
wait until rising_edge(clk);
bad := not a;
d <= bad and b;
e <= bad or c;
end process;
1.12.4
1.12.5
1.12.5
94
If a synthesizable clocked process has a wait statement, then the process must begin with a wait
statement.
process
c <= a;
wait until rising edge(clk);
d <= b;
wait until rising edge(clk);
end process;
process
wait until rising edge(clk);
d <= b;
wait until rising edge(clk);
c <= a;
end process;
Unsynthesizable
Synthesizable
Reason: In simulation, any assignments before the first wait statement will be executed in the
first delta-cycle. In the synthesized circuit, the signals will be outputs of flip-flops and will first be
assigned values after the first rising-edge. To maintain equivalent behaviour between simulation
and synthesis, most synthesis tools require that no assigments appear before the first wait statement
in a process.
1.12.6
Detailed reason: processes with multiple wait statements are turned into finite state machines. The
wait statements denote transitions between states. The target signals in the process are outputs of
flip flops. Using different wait conditions would require the flip flops to use different clock signals
at different times. Multiple clock signals for a single flip flop would be difficult to synthesize,
inefficient to build, and fragile to operate.
1.12.7
95
process (clk)
begin
if rising_edge(clk) then
q0 <= d0;
end if;
if rising_edge(clk) then
q1 <= d1;
end if;
end process;
Reason: The idioms for synthesis tools generally expect just a single if rising edge statement in each process. The simpler the VHDL code is, the easier it is to synthesize hardware.
Programmers of synthesis tools make idiomatic restrictions to make their jobs simpler.
1.12.8
1.12.8
96
An if rising edge statement and a wait statement in the same process (UNSYNTHESIZABLE)
process
begin
if rising_edge(clk) then
q0 <= d0;
end if;
wait until rising_edge(clk);
q0 <= d1;
end process;
Reason: The idioms for synthesis tools generally expect just a single type of flop-generating statement in each process.
1.12.9
The if statement has a rising edge condition and an else clause (UNSYNTHESIZABLE).
process (clk)
begin
if rising_edge(clk) then
q0 <= d0;
else
q0 <= d1;
end if;
end process;
Reason: The idioms for the synthesis tools expect a signal to be either registered or combinational,
not both.
1.12.10
loops where some paths are clocked and some are not (UNSYNTHESIZABLE)
97
process begin
while c /= 1 loop
if b = 1 then
wait until rising_edge(clk);
e <= d;
else
e <= not d;
end if;
end loop;
e <= b;
end process;
Reason: if the loop condition is true and the if-then-else condition is false, then the combinational
path is taken and the process will get stuck in an infinite loop going through the combinational
path.
1.12.11
1.12.11
98
Reason: Idiom of synthesis tools; while-loops with the same behaviour are synthesizable.
.......................................... .
1.13
It is possible to write code that is synthesizable, but undesireble. This sections describes our
guidelines for writing synthesizable code that will result in desireable hardware. Our coding
guidelines are designed for creating circuits will work well for a wide range of implementation
technologies from low-end FPGAs to high-speed ASICs.
Remember, there is a world of difference between getting a design to work in simulation and
getting it to work on a real FPGA. And there is also a huge difference between getting a design
to work in an FPGA for a few minutes of testing and getting thousands of products to work for
months at a time in thousands of different environments around the world.
99
Finally, note that there are exceptions to every rule. You might find yourself in a circumstance
where your particular situation (e.g. choice of tool, target technology, etc) would benefit from
bending or breaking a guideline here. Within E&CE 327, of course, there wont be any such
circumstances.
Our list of undesirable hardware features is:
latches
asynchronous resets
combinational loops
using a data signal as a clock
using a clock signal as data
tri-state buffers and signals
multiple drivers for a signal
We limit our definition of bad practice to code that produces undesirable hardware. The guidelines
do not address coding styles that lead to inefficient hardware. Inefficient or unoptimized hardware
might be useful in the early stages of the design process, when the focus is on functionality and not
optimality. As such, inefficient code is not considered bad practice. Poor coding styles that do not
affect the hardware, for example, including extraneous signals in a sensitivity list, should certainly
be avoided, but fall into the general realm of programming guidelines and will not be discussed.
1.13.1
The most important guideline is: know what you want the synthesis tool to build for you.
For every signal in your design, know whether it should be a flip-flop or combinational. Check
the output of the synthesis tool see if the flip flops in your circuit match your expectations, and
to check that you do not have any latches in your design.
If you cannot predict what hardware the synthesis tool will generate, then you probably will be
unhappy with the result of synthesis.
1.13.2
1.13.2
Latches
100
Latches
................................................
process (a, b)
begin
if (a = 1) then
c <= b;
end if;
end process;
For a combinational process, every signal that is assigned to, must be assigned to in every branch
of if-then and case statements.
reason If a signal is not assigned a value in a path through a combinational process, then that
signal will be a latch.
note For a clocked process, if a signal is not assigned a value in a clock cycle, then the flip-flop
for that signal will have a chip-enable pin. Chip-enable pins are fine; they are available on
flip-flops in essentially every cell library.
.................................................. .
process (a)
begin
c <= a and b;
end process;
For a combinational process, the sensitivity list should contain all of the signals that are read in
the process.
reason Gives consistent results across different tools. Many synthesis tools will implicitly
include all signals that a process reads in its sensitivity list. This differs from the VHDL
Standard. A synthesis tool that adheres to the standard will either generate an error or will
create hardware with latches or flops clocked by data sigansl if not all signals that are read
from are included in the sensitivity list.
exception In a clocked process using an if rising edge, it is acceptable to have only the
clock in the sensitivity list
101
1.13.3
Asynchronous Reset
In an asynchronous reset, the test for reset occurs outside of the test for the clock edge.
process (reset, clk)
begin
if (reset = 1) then
q <= 0;
elsif rising_edge(clk) then
q <= d;
end if;
end process;
1.13.4
Combinational Loops
A combinational loop is a cyclic path of dependencies through one or more combinational processes.
process (a, b, c) begin
if a = 0 then
d <= b;
else
d <= c;
end if;
end process;
process (d, e) begin
b <= d and e;
end process;
If you need a signal to be dependent on itself, you must include a register somewhere in the
cyclic path.
reason Combinational loops are almost always unstable, in that the value on a signal in the
loop is unpredictable and can change over time, even if the none of the inputs change.
note Registered loops are fine.
note Internally, the implementations of flip-flops and other storage devices use combinational
loops, but these loops are built and analyzed at the analog level to ensure that they are
stable.
1.13.5
1.13.5
102
process begin
wait until rising_edge(clk);
count <= count + 1;
end process;
process begin
waiting until rising_edge( count(5) );
b <= a;
end process;
1.13.6
process begin
wait until rising_edge(clk);
count <= count + 1;
end process;
b <= a and clk;
103
1.13.7
Z as a Signal Value
..................................................................
......................................................... .
entity bad is
port (
io_bad : inout std_logic;
buf_bad : buffer std_logic
);
end entity;
1.13.8
1.13.8
Multiple Drivers
104
Multiple Drivers
process begin
wait until rising edge(clk);
if reset = 1 then
y <= 0;
z <= 0;
end if;
end process;
process begin
wait until rising edge(clk);
if reset = 0 then
if a = 1 then
z <= b and c;
else
z <= d;
end if;
end if;
end process;
process begin
wait until rising edge(clk);
if reset = 0 then
if b = 1 then
y <= c;
end if;
end if;
end process;
Each signal should be assigned to in only one process. This is often called the single assignment
rule.
reason Multiple processes driving the same signal is the same as having multiple gates driving
the same wire. This can cause contention, tri-state values, and other bad things.
exception Multiple drivers are acceptable for tri-state busses or if your implementation technology has wired-ANDs or wired-ORs. FPGAs do not have wired-ANDs or wired-ORs,
and many ASIC designers consider them to be risky and bad practice.