Professional Documents
Culture Documents
ENGN3213
Project
V1.0
Copyright 2009 ANU Engineering
1
Contents
1 Introduction 3
4 Project Details 15
4.1 Keyboard and Display Interfaces . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Implementation Levels of the RP Engine . . . . . . . . . . . . . . . . . . . 16
4.2.1 RP Engine Level I . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.2 RP Engine Level II . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.3 RP Engine Level III . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.4 PEGASUS Boad Peripherals . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Project Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Assessmnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2
1 Introduction
In this course I have emphasised the Register Transfer Level (RTL) description of com-
plex digital systems. An early example of the technique was demonstrated in the design
amd implementation of the MU0 microprocessor (described in detail in the appendix).
By now you should be familiar with the details of the operation of MU0 and ready to
apply the same approach to other designs. The design and operation of MU0 is the best
indicator so far of how you should approach the present project. In the labs, you have
also been learning many new things about the implementation of hardware in VERILOG
that can now be applied in a real design.
The project will give you the opportunity to use the RTL technique for the design of a
system of modest complexity: a reverse polish calculator with 4 significant decimal
digits. The project has various milestones among the specifications to allow you to do a
top-down design and to tackle the project at various levels of complexity with plenty of
scope for individual creativity. A major aspect of the project will be to explore different
approaches of developing the different hardware blocks taking special account of meeting
spec and synthesis in hardware.
Additional information can be found on the course website:
http://engnet.anu.edu.au/DEcourses/engn3213/Documents/PROJECT
3
2 Reverse Polish Notation
The arrival of the HP-35 was a significant event given the market dominance of slide rules
and mechanical calculators for engineering computations. The HP-35 used a traditional
floating decimal display that automatically switched to scientific notation. The fifteen
digit LED display was capable of displaying a 10 digit mantissa plus its sign and a dec-
imal point and a two digit exponent plus its sign. The display was unique in that the
multiplexing was designed to illuminate a single LED segment at a time, rather than a
single LED digit, because HP research had shown that this method was perceived by the
human eye as brighter for equivalent power.
Architecturally, the calculator was a bit-serial machine that processed 56-bit floating-point
numbers, representing 14-digit BCD (Binary Coded Decimal) numbers. Figure 2 shows
the main board of the HP-35. As you can see, integrated dual in-line was the technology
of the day.
4
Figure 2: The HP 35 main board.
4 + 5 =
5
4 ENTER 5 +
There is just one operation key referred to as “ENTER”. Computations are performed
incrementally and results are stored in memory as we proceed. Here is a more complex
example,
(4 + 5 × 2) / 7 =
In RP we would do,
Note the logical manner in which the calculation proceeds and how parentheses and equals
signs are eliminated. To do the project, you will need to familiarise yourself with
Reverse Polish notation.
(4 + 2 × 5) / (1 + 2 × 3)
6
4 ENTER 2 ENTER 5 × + ENTER 1 ENTER 2 ENTER 3 × + /
In the following table the S1-S4 refer to the stack register levels. The register S1 would
be at the top of the stack in the document filing analogy. Hewlett-Packard [1] referred to
it as the bottom of the stack. From now on I refer to this as the input to the stack in
order to avoid confusion.
In the following example it is convenient to introduce an additional register that we
refer to as the KEY HOLDING register (KHR). Though the KHR has no role in
RP per se, it has several practical purposes here. One is to provide a register where
final output from the keyboard can be temporarily stored. Keyboards only provide one
digit at a time. Operands will therefore have to be built from digits prior to arithmetic
processing. Another application of the KHR is that it makes it easier to implement above
RP algorithm.
Exactly how you handle input from the keyboard for RP processing is one of
the design decisions you will have to make in your project.
Input: KHR S4 S3 S2 S1
4 4 . . . .
ENTER 4 4 . . .
2 2 4 . . .
ENTER 2 2 4 . .
5 5 2 4 . .
* 10 4 . . .
+ 14 . . . .
ENTER 14 14 . . .
1 1 14 . . .
ENTER 1 1 14 . .
2 2 1 14 . .
ENTER 2 2 1 14. .
3 3 2 1 14. .
* 6 1 14 . .
+ 7 14 . . .
/ 2 . . . .
The effect of ENTER is to push numbers onto the stack while leaving the current digit
in the KHR. Note that in RPN operators are never stored on the stack. In the algorithm
described here the effect of an operator is that the RP controller pops the stack, triggers
the operation and places the result in the KHR.
In this implementation of RP, the ENTER key has to be pressed whenever
there is further input after an operator so that the last result is stored on the
7
stack and not overwritten by new input to the KHR. As we shall see, this choice
of implementation is by no means unique: the HP-35 handles the storage of prior results
in a different manner.
Example 2
Input: KHR S4 S3 S2 S1
4 4 . . . .
CHS -4 . . . .
ENTER -4 -4 . . .
54 54 -4 . . .
+ 50 . . . .
ENTER 50 50 . . .
1 1 50 . . .
ENTER 1 1 50 . .
3 3 1 50 . .
ENTER 3 3 1 50 .
7 7 3 1 50 .
ENTER 7 7 3 1 50
1 1 7 3 1 50
+ 8 3 1 50 .
* 24 1 50 . .
+ 25 50 . . .
/ 2 . . . .
It should be clear from this example that in any RP calculation you will never
need to seek variables lower than the top level of the stack.
8
Figure 3: The HP 35 instruction sticker.
As you can see the HP implementation differs in a couple of ways from the version pre-
sented above. Firstly the KHR in a HP-35 is actually the input register to the stack,
X (see Fig. 4) (the display being connected to this register). Secondly, after an op-
eration is executed, results are pushed onto the stack without the need for an ENTER
key. Actually the HP-35 algorithm does allow the user to press the ENTER key after an
operator. However this has exactly the same effect as not pressing the ENTERN key: so
it is probably ignored. The reason for this design decision appears to be to reduce the
number of ENTER key strokes used in lengthy calculations. In my experience one of the
greatest weaknesses in the engineering of the HP RP calculators was the tendancy of keys
to stick after extended use.
9
Figure 4: The HP-35 RP implementation explained.
10
3 RPC Design and Specification
3.1 General
The overall block diagram of the project is shown in Figure 5.
Figure 5:
In addition to the reverse polish calculator RTL system (RP engine), the project also
involves interfacing to a PS/2 AT keyboard and a seven segment display. You should
treat the keyboard and seven segment interfaces as separate design projects from the
RTL design of the RP engine itself. In designing these three subsystems you will have
to decide how they will interface to each other. How you present data to the RP engine
from the keyboard affects the form and timing of the inputs to the RP engine. Similarly
how you process data in the RP engine affects what type of decimal encoding has to be
done before the seven segment display.
http://www.beyondlogic.org/keyboard/keybrd.htm
As described in this article when a key is pressed, the keyboard sends data frames referred
to as scan codes via an asynchronous serial protocol. As shown in Figs 7 and 6 these
scan codes are mostly 8 bit but in some cases (for e.g. the Del, Ins, / and ENTER keys)
they are 16 bit. As we are using a subset of all available keys, you will need to design
your keyboard interface to deal with unused keys in a friendly way.
The PS/2 keyboard interface will involve two main parts.
11
Figure 6: Keyboard showing the RP function keys.
2. An output buffer to make KEY data available in a suitable format to the RP engine.
A sample serial protocol FSM is available on the project website. You may use this as
a starting point to develop code to receive data from the keyboard. This code follows
Wakerly’s coding style for VERILOG coding of FSMs. You will have to adapt it to the
specific PS/2 keyboard protocol shown in Fig. 8.
The form in which you provide the keyboard output influences the design of the RP RTL
controller. For example we will provide the key identity in a coded format, KEY, (if a key
is pressed) or as a NOKEY symbol (if no key or a wrong key is pressed) on the posedge
of the RP system CLOCK. Using this technique, unused scan codes can be replaced with
the NOKEY code.
12
Question: Read through the above article and explain how the IDLE mode pull up to
+5V is obtained.
1. A BCD (binary coded decimal) or other encoder to convert the RP engine’s chosen
number format into a form suitable for driving the display.
2. A display driver that drives the anodes and cathodes of the seven segment display
given the decimal values of the digits to be displayed.
Dealing with the former issue is a big part of the project and the output format depends
on the implementation level you are trying to achieve. You should already be familiar
with code capable of implementing the display anode and cathode driver.
Further descriptions of the keyboard and seven segment displays can be found in the
PEGASUS and BASYS manuals on the project website:
http://engnet.anu.edu.au/DEcourses/engn3213/Documents/PROJECT
13
Reset
keyID from KB interface
Input
Input
Key Holding Register (KHR)
Output
FSM
Output Arithmetic Logic Unit
Display
Stack In Stack Out interface
Output
that the calculator can be manually forced into the INIT state (commonly referred to as
“switching on the calculator”).
The keyID inputs are analogous to the commands stored in memory in MU0. These
determine the state transitions of the RP controller FSM.
The outputs of the controller are a bunch of enable and reset switches that control the
hardware blocks of the data path. As is the case for MU0, there should no need to
send the data buses through the controller FSM (see Fig. 9.)
In addition, the data path consists of well defined hardware blocks. In the
present example these are the key holding register, an arithmetic logic unit and a
stack.
14
4 Project Details
Now that we have an idea of what the project involves, let’s see what you have to do.
The project will consist of a design and implementation of various RP calculators for
the PEGASUS board with an XC2S50 FPGA (our target hardware). The aim will be to
design and implement a keyboard interface, a seven segment display interface and up to 3
designs and implementations of increasing complexity and functionality of the RP engine.
KEY / NOKEY
RP Sys Clk
Figure 10: Keyboard input and output formats. The timing diagram shows how the
keyboard interface outputs keys on the posedge of the RP system clock.
15
KeyID Representation
ZERO 5’h00
ONE 5’h01
TWO 5’h02
THREE 5’h03
FOUR 5’h04
FIVE 5’h05
SIX 5’h06
SEVEN 5’h07
EIGHT 5’h08
NINE 5’h09
ENTER 5’h0A
CHS 5’h1A
CLX 5’h0B
CLR 5’h1B
PLUS 5’h0C
MINUS 5’h1C
TIMES 5’h0D
DIV 5’h1D
DP 5’h0E
NOKEY 5’h1E
2. A fixed point signed decimal calculator that also does multiplication and division.
3. A floating point signed decimal calculator that also does multiplication and division.
16
ALU. Much of this requires a good understanding of number systems and
representations. Those who have not done the COMP2300 course may find
the following lecture notes useful.
http://cs.anu.edu.au/student/comp2300/lectures/
This is the simplest level. We confine ourselves to decimal integer addition and subtrac-
tion. Key functions will be entered from the PS/2 keyboard and will be displayed on
the seven segment display. To illustrate the functionality at this level consider Figure 11
showing the front panel of a HP-35 calculator. The relevant keys are shown inside the
yellow squares.
The large blue key on the top left is the ENTER key. The operator keys − and + are in
blue at the left. The CHS button changes the sign of the current number on the display
and CLX clears the display to a 0. The CLR key clears the stack. At this level we will
not implement the keys that have a red cross through them. These include, among many
17
others, the EEX key which converts a number to scientific notation, the PI key which
stores the number π and the decimal point key.
The following table shows the meaning and AT keyboard designations of the HP-35 keys
of Figure 11.
− ′′
−′′ Subtract
+ ′′
+′′ Add
At this level you develop your basic RTL design. This is the most important
project milestone. Try to make it extensible to the more complex designs.
The precision is to be the full 4.0.
The aim is to implement fixed point arithmetic with fractional decimals: a very common
functionality in digital systems such as data radios and MPEG codecs. Fig. 12 shows
the HP-35 keys.
At this level we include multiplication, division and a fixed decimal point in
the middle of the display. The precision is 2.2.
The following table shows the meaning and keyboard designations of the HP-35 keys of
Figure 12.
18
Figure 12: HP-35 functionality for the level II and III systems.
− ′′
−′′ Subtract
+ ′′
+′′ Add
× ′′ ′′
∗ Multiply
/ ′′ ′′
/ Divide
19
4.2.3 RP Engine Level III
This is the most difficult level. We aim to implement floating point arithmetic (allbeit
without scientific notation). The floating decimal point in the result adjusts itself to
the appropriate position on the display. Floating point will allow us to multiply decimal
numbers with larger dynamic range than fixed point.
The precision is 4 digits maximum before the decimal point and 3 digits max-
imum after the decimal point.
Since we will not be trying to implement exponents, the HP-35 functionality is the same
as in Fig. 12 followed by the same table above showing the keyboard designations.
Fig. 13 shows the PEGASUS board peripherals that will be used in the project.
You may also have noticed that it will not be possible to accurately represent
results that include minus signs if we confine ourselves to the four digit seven
segment display. In this project we will use the four decimal digits of the
20
display for numerical data and an illuminated LED for a negative result as
shown in Fig. 13.
I will leave it to you to decide how to deal with overflows. For example you may decide to
display a row of four minus signs. Interestingly the HP-35 fails to handle overflow properly.
The HP-35 rounds overflowing results down to the maximum number 9.999999999 × 1099.
Dividing any two numbers larger than this by each other produces a 1.
1. You may work in groups of one or two (maximum). Let me know the group members.
2. You hand in one report per group.
3. The length of the report should be < 30 pages. There is no pressure to produce a
big report and there will be no penalties for exceeding the limit.
4. Hand in the report in hardcopy and include the code and a softcopy of the report
on an attached CD.
5. You may not use any third party code. All VERILOG code is to be the
original work of the group save code offered for general use on the course
website.
6. You must follow the design conventions introduced in this document.
7. The project is worth 40% of the final mark and must be handed in to me
by C.O.B Friday June 5.
4.4 Assessmnet
Assessment will consist of the following tasks to be described in the report. The project
report will be worth 30 marks in total. In addition there will be a hardware test lab at
which marks will be awarded for successful implementations. The lab demonstrations will
be worth a maximum of 10 marks.
1. A short introduction to the project and the approach taken. A short description of
how the labour was divided amongst team members (if relevant). 2 marks
21
2. A description of the design of the keyboard and display interfaces. 4 marks
3. A description of the design of the RTL control path of the level I system including
next state tables and/or state diagrams, Karnaugh maps and appropriate excerpts
from the VERILOG. 4 marks
5. For the level I system provide test benches and simulation traces (e.g. GTKWAVE
or ISE XST) demonstrating individual working hardware blocks and a complete
working system. 4 marks
6. A description of the level II and III hardware blocks in the datapath. A detailed
description of the arithmetic logic unit, how it does its calculations and what design
trade-offs you have had to make to meet timing constraints and space limitations.
3 further marks each
7. For each level you attempt, provide the FPGA resource consumption through the
ISE synthesis reports. Provide and discuss the floorplanner output(s). flat 3 marks
total
9. A hardware demonstration of levels I. Does the calculator work? Does the hardware
meet spec? 4 marks
10. A hardware demonstration of levels II and III. Does the calculator work? Does the
hardware meet spec? - 3 further marks each
Lab sessions to test hardware will be assigned closer to the final date.
22
A Appendix: A Description of the MU0 Micropro-
cessor
A.1 Introduction
The definition of the instruction set shown in Fig. 14 and the requirement of two clock
cycles for an instruction forms the specification of MU0.
The first thing to do is to understand what goes on with these instructions. Note the
syntax of the commands. The symbol S refers to a memory address. The notation [S]
refers to the contents of the memory location.
Consider the datapath of Fig. 15. It shows the following hardware systems,
1. A program counter register (PC) which stores the address in the memory of the
current instruction. Exactly what is the current instruction and what is the next
instruction we’ll see in a minute. The addresses count from 0 upwards and in any
program, the instructions are stored in the first contiguous memory locations while
the data is store in the subsequent locations. This is the basis of the Von Neumann
architecture wherein program and data are stored sequentially in memory.
23
Figure 15: MU0 architecture.
2. An instruction register (IR) which contains the instruction while it is being exe-
cuted.
5. Several multiplexers
In MU0 the data has 16 bits and the memory has storage locations that are 16 bits wide.
The data in memory is stored at locations that can be located by their address. These
addresses are represented by words that are 12 bits wide. That is MU0’s memory has 212
memory locations where data can be stored.
It is interesting and entirely pertinent to note that an instruction word consists
of 16 bits and can therefore be stored in memory. The most significant four bits
([15:12] in VERILOG parlance) of the instruction word is referred to as the opcode. This
is the machine language symbol that represents an instruction. This is the hex number
F in the left most column of Fig. 14. There are 16 possible opcodes but only 8 are
implemented in MU0. The meanings of the instructions are also described in Fig. 14.
The remaining 12 bits in the instruction word is the address in memory of either the
operand that the instruction operates on (in the case of LDA, ADD and SUB) or the
24
destination of the data in the ACC (STO) or the address of the next instruction in the
case of the JUMP commands (JMP, JGE, JNE).
Fig. 16 shows the instruction word format.
The program counter is incremented every instruction. It is only controlled by the active
edges of the clock. Consequently MU0 automatically runs sequentially through the ad-
dresses in memory. Reading an instuction occurs when that instruction appears in the IR
in the EXEC state.
The cunning in the design of the MU0 architecture is that each hardware block in the
datapath of Fig 15 is configured to execute these instructions by appropriate changes in
their control inputs. Examples of controls are PCen (enable PC), ACCen (enable
the ACC), Asel (choose the input that connects to the output of the address
MUX, a-mux), M (choose the function to be performed by the ALU), etc. The
bit values of these controls are the outputs of the control path FSM.
The second column shows the opcode, F. The opcode is MU0’s only input. MU0 obtains
the opcode when the controller FSM reads bits [15:12] of the IR. The third column is
the next state that the controller jumps to. Note that the opcode is not needed in the
FETCH state (0) because in this state the only steps are to mux the PC contents onto
the address bus through the a-mux and to set up the ALU input control, M, for a PC
increment. As a result, regardless of the F or opcode value, the next state is EXEC (1).
This explains the dont cares, “XXX” in the F cell in the table.
If you look at the next-state diagram you should be able to confirm the interconnects of
MU0 in Fig. 18 for the FETCH state. The grey tracks indicate connected paths in the
datapath.
25
Figure 17: MU0 next state diagram
1. The FETCH cycle occurs at the first positive edge of the clock.
2. In the FETCH state the a-mux input is connected to the PC output. The MUX is
a combinational device and so the PC contents should already be pointing to the
address of the next instruction in memory.
3. At tbe ensuing negative clock transition the memory transfers the contents of the
location whose address is in the PC to the Dbus. The Dbus is the output data line
of the memory whereas the Xbus is the input data line.
26
4. The contents of the Dbus are now present at the input to the IR.
5. In the FETCH state the PC contents are also pointing at the ALU input through
the x-mux. The ALU M-value is set so that the ALU increments the value on
this input. Since both the x-mux and the ALU are combinational devices, the PC
incremented contents are transferred instantaneously at the PC input. On the next
positive clock transition the contents of the PC will be incremented ready for the
next time the FSM is in the FETCH state.
From the second positive clock transition we are in the EXEC state. The sequence of
events that occur in the this state are as follows.
• If the opcode is LDA then the ACC is enabled and the y-mux is set so that
the Dbus is connected to the ALU input on the Ybus. The ALU M value is set
for a through connnection on its Ybus input. At the positive edge of the next
clock transition into the FETCH state, the ACC output will store the contents
of the Dbus.
• If the opcode is for ADD or SUB then the ACC is enabled and the Dbus is
again connected to the ALU via the Ybus through y-mux. The x-mux is set to
allow the contents of the ACC onto the Xbus and the ACC M value is set for
ADD or SUB. On the subsequent negative clock transition the contents of the
memory is transferred onto the Dbus. At the positive edge of the next clock
transition into the FETCH state, the ACC output will store the sum of its
previous value and that in the memory location.
• If the opcode is STO, the x-mux places the contents of the ACC on the input
to the Xbus which is also the memory input data line. The last 12 bits of the
contents of the IR are sent via the a-mux to the address bus of the memory.
On the next negative clock transition the memory stores the contents of the
Xbus (the contents of the ACC).
• In the case of the JUMP instructions, the last 12 bits of the instruction register
are sent via the y-mux to the Ybus and the ALU. The ALU is set for straight
through so that this new memory address is fed to the PC. At the ensuing
posedge of clock (FETCH) the PC is changed to the address which is the
operand of the JUMP instruction.
27
Figure 18: The MU0 datapath interconnects during FETCH and EXEC.
0004 (load (LDA) the contents of memory adddress 4 into the ACC)
2005 (add (ADD) the contents of memory address 5 to that in the ACC)
1006 (store (STO) the contents of the ACC in memory location 6)
7000 (STOP)
000A (data stored in memory location 4)
0001 (data stored in memory location 5)
0000 (data stored in memory location 6)
Notice how execution occurs in purely sequential fashion. MU0 does not know which
memory addresses contain instructions and which data. Its proper operation depends
entirely on proper programming and the march of the PC contents. The STOP command
terminates execution and prevents the processor from trying to perform a false opcode in
the first hex digit of the data at memory location 4.
Fig 19 shows the complete GTKWAVE output from running MU0 with ICARUS VER-
ILOG.
Fig. 20 expands the traces around the FETCH and EXEC states when the instruction
2005 is being executed. For instruction 2005 the PC is pointing to address 1 in memory.
28
Figure 19: GTKWAVE traces of the MU0 data during execution of the above program
During this instruction the contents of memory address 5 (0001) is added to the contents
of the accumulator which is by now 000A. Notice that the actual instruction 2005 does
not appear in the IR until the EXEC state is reached and that the contents of the ACC
do not register the sum, 000B, until the FETCH cycle of the following instruction.
29
FETCH EXEC FETCH
CLOCK
Figure 20: Expected and GTKWAVE traces of MU0 ACC, IR and PC registers around
the execution of the 2005 instruction
seen a little assembly language with PICOBLAZE. and later in the course we will see
some more.
30
References
[1] Thomas M. Whitney, France Rode and Chung C. Tung The ’Powerful Pocketful’: an
Electronic Calculator Challenges the Slide Rule Hewlett Packard Journal, 1972.
[2] David S. Cochran Algorithms and Accuracy in the HP-35 Hewlett Packard Journal,
1972.
[3] M. Ercegovac, T. Lang, and J.H. Moreno Introduction to Digital Systems Wiley,
1999.
[4] John F. Wakerly Digital Design: Principles and Practices Prenitce-Hall, 2000.
31