Professional Documents
Culture Documents
1
MEMORY
MICROPROCESSOR INPUT
OUTPUT
The first microprocessors emerged in the early 1970s and were used for electronic
calculators, using binary-coded decimal (BCD) arithmetic on 4-bit words. Other
embedded uses of 4-bit and 8-bit microprocessors, such as terminals, printers, various
kinds of automation etc., followed soon after. Affordable 8-bit microprocessors with 16-
bit addressing also led to the first general-purpose microcomputers from the mid-1970s
on.
During the 1960s, computer processors were often constructed out of small and
medium-scale ICs containing from tens to a few hundred transistors. The integration of a
whole CPU onto a single chip greatly reduced the cost of processing power. From these
humble beginnings, continued increases in microprocessor capacity have rendered other
forms of computers almost completely obsolete (see history of computing hardware),
with one or more microprocessors used in everything from the smallest embedded
systems and handheld devices to the largest mainframes and supercomputers.
Since the early 1970s, the increase in capacity of microprocessors has followed
Moore's law, which suggests that the number of transistors that can be fitted onto a chip
doubles every two years. Although originally calculated as a doubling every year, Moore
later refined the period to two years. It is often incorrectly quoted as a doubling of
transistors every 18 months.
2
In both cases, the higher the value, the more powerful the CPU. For example, a
32-bit microprocessor that runs at 50MHz is more powerful than a 16-bit microprocessor
that runs at 25MHz. In addition to bandwidth and clock speed, microprocessors are
classified as being either RISC (reduced instruction set computer) or CISC (complex
instruction set computer).
The microprocessors used in systems are mainly of 2 types-:
1) Microcontrollers, that includes all components shown in fig above, and (2) general
purpose microprocessor, with discrete components shown in fig above.
3
Fig 1.2 Harvard and Von-Neumann architecture
4
A recent trend has been to take the coarse-grained architectural approach a step
further by combining the logic blocks and interconnects of traditional FPGAs with
embedded microprocessors and related peripherals to form a complete "system on a
programmable chip". An alternate approach to using hard-macro processors is to make
use of soft processor cores that are implemented within the FPGA logic.
FPGAs are beneficial in industrial designs:
i. Design integration with user‘s choice of intellectual property (IP) and software
stacks.
ii. Flexibility to change design to keep pace with evolving protocols and new feature
requirements.
iii. Performance scaling with embedded processors and IP blocks within the FPGA.
5
CHAPTER 2: VHDL BASICS
2.1 Introduction
VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware
Description Language. In the mid-1980‘s the U.S. Department of Defense and the IEEE
sponsored the development of this hardware description language with the goal to
develop very high-speed integrated circuit. It has become now one of industry‘s standard
languages used to describe digital systems.
Although these languages look similar as conventional programming
languages, there are some important differences. A hardware description language is
inherently parallel, i.e. commands, which correspond to logic gates, are executed
(computed) in parallel, as soon as a new input arrives. A HDL program mimics the
behavior of a physical, usually digital, system. It also allows incorporation of timing
specifications (gate delays) as well as to describe a system as an interconnection of
different components.
VHDL allows one to describe a digital system at the structural or the
behavioral level. The behavioral level can be further divided into two kinds of styles:
Data flow and Algorithmic. The dataflow representation describes how data moves
through the system. This is typically done in terms of data flow between registers. The
data flow model makes use of concurrent statements that are executed in parallel as soon
as data arrives at the input. On the other hand, sequential statements are executed in the
sequence that they are specified. VHDL allows both concurrent and sequential signal
assignments that will determine the manner in which they are executed.
6
schedules, for low volume products, and for first production runs even with high volume
products.
Designing with larger capacity CPLDs and FPGAs of 500 to more than
100,000 gates, Boolean equations or gate level descriptions can no longer be used to
quickly and efficiently complete a design. VHDL provides high level language constructs
that enable designers to describe large circuits and bring products to market rapidly. It
supports the creation of design libraries in which to store components for reuse in
subsequent designs .Because it is a standard language (IEEE standard 1076), VHDL
provides portability of code between synthesis and simulation tools, as well as device
independent design.
An appropriate design methodology is the one that increases the efficiency
of designers. At a slightly more detailed level, it facilitates capturing, understanding and
maintaining a design; it is not open to interpretation ,but is well defined; it is an open
standard accepted by industry; it allows designs to be ported from one EDA environment
to another , so that modules can be packaged and reused ; it supports complex designs
with hierarchy and gate level to system level design ; it may be used for the description ,
simulation and synthesis of the logic circuits and it supports multiple levels of design
description. VHDL satisfies all the mentioned requirements for digital logic design. For
the combined purpose of documentation, synthesis and simulation for both devices and
systems, VHDL is the only excellent choice.
VHDL is a product of the VHSIC program funded by the Department of
Defense in the 1970s and 1980s. VHDL was established the IEEE 1076 standard in 1987.
In 1993, the IEEE standard was updated and an additional VHDL standard, IEEE 1164,
was adopted. In 1996, IEEE 1076.3 became a VHDL synthesis standard.
7
creation of reusable components. It provides the design hierarchies to create modular
designs.
2) Device –Independent Design:
VHDL permits to create a design without having to first choose a device
for implementation. With one design description, many device architectures can
be targeted. VHDL also permits multiple styles of design description.
Eg:
Net lists:
U1: xor2 port map (a (0), b (0), x (0));
U2: xor2 port map (a (1), b (1), x (1));
U3: nor2 port map(x(0),x(1),aeqb);
Boolean Equations:
aeqb <= (a(0) XOR b(0)) NOR (a(1) XOR b(1));
Concurrent Statements:
aeqb <= ‗‘when a = b else ‗‘;
Sequential Statements:
if a = b then aeqb <= ‗‘;
else aeqb <= ‗‘;
end if;
3) Portability:
VHDL‘s portability permits to simulate the same design description that is
synthesized. Simulating a several- thousand- gate design description before
synthesizing it can save considerable time: a design flaw discovered at this stage
can be corrected before the design implementation stage. Because VHDL is a
standard, one design description can be taken from one simulator to another, one
synthesis tool to another and one platform to another.
4) Benchmarking Capabilities:
Device – independent design and portability allows benchmarking a
design using different design device architectures and different synthesis tools. A
completed design description can be taken and it can be synthesized, creating
8
logic for architecture of the required choice. The results can be evaluated and
the device that best fits the design requirement can be chosen. The same can be
done with synthesis tools to measure the quality of the synthesis.
5) ASIC Migration:
The efficiency that VHDL generates allows the product to hit the market
quickly if the design is synthesized to a CPLD or FPGA. When production
volumes reach appropriate levels, VHDL facilitates the development of an ASIC.
Sometimes, the exact code used with the PLD can be used with an ASIC.
6) Quick Time –to –Market and Low Cost:
VHDL and programmable logic pair well together to facilitate a speedy
design process. VHDL permits designs to be described quickly.
Programmable logic eliminates NRE expenses and facilitates quick design
iterations. Synthesis makes it all possible. VHDL and programmable logic
combine as a powerful vehicle to bring products to markets in a very short time.
9
An entity always starts with the keyword entity, followed by its name and
the keyword is. Next are the port declarations using the keyword port. An entity
declaration always ends with the keyword end, optionally followed by the name
of the entity.
ii. Architecture body
The architecture body specifies how the circuit operates and how it is
implemented. As discussed earlier, an entity or circuit can be specified in a
variety of ways, such as behavioral, structural (interconnected components), or a
combination of the above.
The architecture body looks as follows,
architecture architecture_name of NAME_OF_ENTITY is
-- Declarations
-- components declarations
-- signal declarations
-- constant declarations
-- function declarations
-- procedure declarations
-- type declarations
begin
-- Statements
:
end architecture_name;
iii. Library and Packages: library and use keywords
A library can be considered as a place where the compiler stores
information about a design
project. A VHDL package is a file or module that contains declarations of
commonly used objects, data type, component declarations, signal,
procedures and functions that can be shared among different VHDL models.
For example std_logic is defined in the package ieee.std_logic_1164 in the
ieee library. In order to use the std_logic one needs to specify the library and
10
package. This is done at the beginning of the VHDL file using the library and the
use keywords as follows:
library ieee;
use ieee.std_logic_1164.all;
The .all extension indicates to use all of the ieee.std_logic_1164 package.
One can add other libraries and packages. The syntax to declare a package is as
follows:
-- Package declaration
package name_of_package is
package declarations
end package name_of_package;
-- Package body declarations
package body name_of_package is
package body declarations
end package body name_of_package;
11
case statement
exit statement
if statement
loop statement
next statement
null statement
procedure call
wait statement
end process [process_label];
An example of a positive edge-triggered D flip-flop is as follows.
library ieee;
use ieee.std_logic_1164.all;
entity DFF_CLEAR is
port (CLK, CLEAR, D : in std_logic;
Q : out std_logic);
end DFF_CLEAR;
architecture BEHAV_DFF of DFF_CLEAR is
begin
DFF_PROCESS: process (CLK, CLEAR)
Begin
if (CLEAR = ‗‘) then
Q <= ‗‘;
elsif (CLK‘event and CLK = ‗‘) then
Q <= D;
end if;
end process;
end BEHAV_DFF;
A process is declared within architecture and is a concurrent statement.
However, the statements inside a process are executed sequentially. Like other
concurrent statements, a process reads and writes signals and values of the
interface (input and output) ports to communicate with the rest of the architecture.
12
One can thus make assignments to signals that are defined externally to the
process, such as the Q output of the flip-flop in the above example. The
expression CLK‘event and CLK = ‗‘ checks for a positive clock edge.
The sensitivity list is a set of signals to which the process is sensitive. Any
change in the value of the signals in the sensitivity list will cause immediate
execution of the process. If the sensitivity list is not specified, one has to include a
wait statement to make sure that the process will halt.
ii. If Statements
The if statement executes a sequence of statements whose sequence
depends on one or more conditions. The syntax is as follows:
if condition then
sequential statements
[elsif condition then
sequential statements ]
[else
sequential statements ]
end if;
Each condition is a Boolean expression. The if statement is performed by
checking each condition in the order they are presented until a ―true‖ is found.
Nesting of if statements is allowed.
iii. Case statements
The case statement executes one of several sequences of statements,
based on the value of a single expression. The syntax is as follows,
case expression is
when choices =>
sequential statements
when choices =>
sequential statements
-- branches are allowed
[ when others => sequential statements ]
end case;
13
The expression must evaluate to an integer, an enumerated type of a one-
dimensional array, such as a bit_vector. The case statement evaluates the
expression and compares the value to each of the choices. The when clause
corresponding to the matching choice will have its statements executed.
iv. Loop statements
A loop statement is used to repeatedly execute a sequence of sequential
statements. The syntax for a loop is as follows:
[ loop_label :]iteration_scheme loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop [loop_label];
Labels are optional but are useful when writing nested loops. The next and
exit statement are sequential statements that can only be used inside a loop.
The next statement terminates the rest of the current loop iteration and execution
will proceed to the next loop iteration. The exit statement skips the rest of the
statements, terminating the loop entirely, and continues with the next statement
after the exited loop.
There are three types of iteration schemes:
a. Basic Loop statement
This loop has no iteration scheme. It will be executed continuously
until it encounters an exit or next statement.
[ loop_label :] loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop [ loop_label];
The basic loop (as well as the while-loop) must have at least one wait
statement
14
b. While-Loop statement
The while … loop evaluates a Boolean iteration condition. When
the condition is TRUE, the loop repeats, otherwise the loop is skipped and
the execution will halt. The syntax for the while…loop is as follows,
[ loop_label :] while condition loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop[ loop_label ];
The condition of the loop is tested before each iteration, including
the first iteration. If it is false, the loop is terminated.
c. For-Loop statement
The for-loop uses an integer iteration scheme that determines the number
of iterations. The syntax is as follows,
[ loop_label :] for identifier in range loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop[ loop_label ];
i. The identifier (index) is automatically declared by the loop itself, so
one does not need to declare it separately. The value of the identifier
can only be read inside the loop and is not available outside its loop.
One cannot assign or change the value of the index.
ii. The range must be a computable integer range in one of the following
forms, in which integer_expression must evaluate to an integer:
iii. integer_expression to integer_expression
iv. integer_expression downto integer_expression
v. Wait statement
The wait statement will halt a process until an event occurs. There are
several forms of the wait statement,
wait until condition;
15
wait for time expression;
wait on signal;
wait;
The condition in the ―wait until‖ statement must be TRUE for the process to
resume.
A few examples follow.
wait until CLK=‘‘;
wait until CLK=‘‘;
For the first example the process will wait until a positive-going clock edge
occurs, while for the second example, the process will wait until a negative-going clock
edge arrives.
Behavioral modeling can be done with sequential statements using the process
construct or with concurrent statements. This method is usually called dataflow
modeling. The dataflow modeling describes a circuit in terms of its function and the flow
of data through the circuit. Concurrent signal assignments are event triggered and
executed as soon as an event on one of the signals occurs.
16
The syntax for the conditional signal assignment is as follows:
Target_signal <= expression when Boolean_condition else
expression when Boolean_condition else
expression;
The target signal will receive the value of the first expression whose
Boolean condition is TRUE. If no condition is found to be TRUE, the target signal will
receive the value of the final expression. If more than one condition is true, the value of
the first condition that is TRUE will be assigned.
iii. Selected Signal assignments
The selected signal assignment is similar to the conditional one described
above. The syntax is as follows,
with choice_expression select
target_name <= expression when choices,
target_name <= expression when choices,
:
target_name <= expression when choices;
The target is a signal that will receive the value of an expression whose choice
includes the value of the choice_expression.
iv. Structural Modeling
A structural way of modeling describes a circuit in terms of components
and its interconnection. Each component is supposed to be defined earlier (e.g. in
package) and can be described as structural, a behavioral or dataflow model. At the
lowest hierarchy each component is described as a behavioral model, using the basic
logic operators defined in VHDL. In general structural modeling is very good to
describe complex digital systems, though a set of components in a hierarchical fashion.
A structural description can best be compared to a schematic block diagram that can be
described by the components and the interconnections. VHDL provides a formal way to
do this by
a. Declare a list of components being used
b. Declare signals which define the nets that interconnect components
17
c. Label multiple instances of the same component so that each instance is
uniquely defined.
The components and signals are declared within the architecture body,
architecture architecture_name of NAME_OF_ENTITY is
-- Declarations
component declarations
signal declarations
begin
-- Statements
component instantiation and connections
:
end architecture_name;
v. Component declaration
Before components can be instantiated they need to be declared in the
architecture declaration section or in the package declaration. The component
declaration consists of the component name and the interface (ports). The syntax is as
follows:
component component_name [is]
[port (port_signal_names: mode type;
port_signal_names: mode type;
port_signal_names: mode type);]
end component [component_name];
The list of interface ports gives the name, mode and type of each port, similarly
as is done in the entity declaration.
vi. Component Instantiation and interconnections
The component instantiation statement references a component that can be
i. Previously defined at the current level of the hierarchy or
ii. Defined in a technology library (vendor‘s library).
The syntax for the components instantiation is as follows,
instance_name : component name
port map (port1=>signal1, port2=> signal2,… port3=>signaln);
18
The instance name or label can be any legal identifier and is the name of
this particular instance. The component name is the name of the component declared
earlier using the component declaration statement. The port name is the name of the
port and signal is the name of the signal to which the pecific port is connected. The
above port map associates the ports to the signals through named association.
19
CHAPTER 3: STEPS IN PROCESSOR DEVELOPMENT
VHDL SYNTHESIS
RTL SIMULATION
VHDL DESIGN
20
The design starts with VHDL specification, which specifies the behavior expected in
the final design.
In the next step, an RTL(Register Transfer Level) is created in which clock-by-clock
behavior of the design is described.
The correctness of VHDL is verified by RTL Simulation using test vectors. The
output of this stage is waveforms display
21
After the cells are placed, router makes the appropriate connections.
The output is data files used to implement the chip which describes
connections required to make the FPGA macrocells implement the
functionality required and timing file which describes the timing of
programmable FPGA.
Place and route simulation verify the results of the above process
After completing hardware platform design entry, generate the
bitstream(BIT) file that represents the completed hardware platform.
To create BIT file for the implemented design, must first set User
Constraints File(UCF).
The UCF specifies pinouts and timing constraints. It can also control a
variety of other hardware implementation features, such as the
configurable electrical characteristics of your FPGA I/O signals.
To make the device permanent, load the above steps to an ASIC device.
The place and route tools for ASIC device can be obtained from the
corresponding ASIC vendor or EDA(Electronic Design Automation)
vendor.
22
CHAPTER 4: DESIGN OF 8 BIT PROCESSOR
23
Clock
Processor
Reset Output
8
24
4.2 Functional description of modules
The various components or modules of our processor were designed, tested and
simulated separately for operational correctness. This section briefly describes each
module, its functions, features and design characteristics.
4.2.1 ALU
The Arithmetic and Logic unit of our processor was the first module to be
developed. In computing, the arithmetic logic unit (ALU) is a digital circuit that performs
arithmetic and logical operations. The ALU is a fundamental building block of the CPU
of a computer and even the simplest microprocessors contain one for purposes such as
maintaining timers. The processors found inside modern CPUs and GPUs accommodate
very powerful and very complex ALUs; a single component may contain a number of
ALUs.
An ALU must process numbers using the same format as the rest of the digital
circuit. The format of modern processors is almost always the two‘s complement binary
number representation. Early computers used a variety of number systems, including
one‘s complement, sign-magnitude format and even true decimal systems with ten tubes
per digit.
ALUs for each one of these numeric systems had different designs and that influenced the
current preference for two‘s complement as this is the representation that makes it easier
for the ALUs to calculate additions and subtractions.
Most ALUs can perform the following operations:
Integer arithmetic operations(addition, subtraction and sometimes multiplication
and division, though this is more expensive)
Bitwise logic operations(AND,NOT,OR,XOR)
Bit-shifting operations (shifting or rotating a word by a specified number of bits
to the left or right, with or without sign extension). Shifts can be interpreted as
multiplications by 2 and divisions by 2.
An engineer can design ALU to calculate any operation, however complicated it is; The
problem is that the more complex the operation, the more expensive the ALU is, the more
space it uses in the processor and the more power it dissipates, etc. Therefore, engineers
25
always calculate a compromise to provide for the processor (or other circuits) an ALU
powerful enough to make the processor fast, but yet not so complex as to become
prohibitive. Imagine that you need to calculate the square root of a number, the digital
engineer will examine the following options to implement this operation:
1. Design an extraordinary complex ALU that calculates the square root of any
number in a single step. This is called calculation in a single clock.
2. Design a very complex ALU that calculates the square root of any number in
several steps. But the intermediate results go through a series of circuits that are
arranged in a line, like factory production line. That makes the ALU capable of
accepting new numbers to calculate even before finished calculating the previous
ones. That makes the ALU able to produce numbers as fast as a single clock
ALU, although the results start to flow out of the ALU only after an initial delay.
This is called calculation pipeline.
3. Design a complex ALU that calculates the square root through several steps. This
is called interactive calculation and usually relies on control from a complex
control unit with built-in-microcode.
4. Design a simple ALU in the processor and sell a separate specialized and costly
processor that the customer can install just beside this one and implements one of
the options above. This is called the co-processor.
5. Tell the programmers that there is no co-processor and there is no emulation, so
they will have to write their own algorithms to calculate square roots by software.
This is performed by software libraries.
6. Emulate the existence of the co-processor, that is, whenever a program attempts to
perform the square root calculation, make the processor check if there is a co-
processor present and use it if there is one; if there isn‘t one, interrupt the
processing of the program and invoke the operating system to perform the square
root calculation through some software algorithm. This is called software
emulation.
The options above go from the fastest and most expensive one to the slowest and least
expensive one. Therefore, while even the simplest computer can calculate the most
complicated formula, the simplest computers will usually take a long time doing that
26
because of the several steps for calculating the formula. The inputs to the ALU are the
data to be operated on (called operands) and a code from the control unit indicating
which operation to perform. Its output is the result of the computation. In many designs
the ALU also takes or generates as inputs or outputs a set of condition codes from or to a
status register. These codes are used to indicate cases such as carry-in or carry-out,
overflow, divide-by-zero, etc.
The ALU of our processor has two 8 bit data inputs, ‗a‘ and ‗b‘. The ‗a‘ input is
taken directly from the accumulator, and the ‗b‘ input is attached to the data bus. Besides
these inputs, we have the following input signals into the ALU:
clk:in std_logic; (main clock signal)
r:in std_logic; (reset input)
and_enable:in std_logic;
or_enable:in std_logic;
not_enable:in std_logic;
xor_enable: in std_logic;
shiftleft_enable:in std_logic;
shiftright_enable:in std_logic;
add_enable:in std_logic;
subtract_enable:in std_logic;
multiply_enable:in std_logic;
compare_enable:in std_logic;
The ouput of the ALU is directly connected to the accumulator. Thus, any
processed data is stored in the accumulator after instruction execution. In addition, the
ALU has two output flag signals. The zset (zero) and the cset (carry) flag signals. They
are connected to the control unit for detection of conditions during certain instruction
decoding operations.
4.2.2 Accumulator
The processor has an 8 bit accumulator. The accumulator has the following input
signals:
acc_rd_alu : in STD_LOGIC;
27
acc_wr_alu : in STD_LOGIC;
eni : in STD_LOGIC;
eno : in STD_LOGIC;
clk : in STD_LOGIC;
r : in STD_LOGIC;
d : in STD_LOGIC_VECTOR (7 downto 0);
alu_in : in STD_LOGIC_VECTOR (7 downto 0);
The clk and r inputs are the clock and reset respectively. Besides these, there are two 8 bit
inputs to the accumulator. The ‗d‘ input and the ‗alu_in‘ input. The ‗d‘ input is connected
directly to the databus of the processor. The ‗alu_in‘ input is connected to the ALU, and
is used to store the results of arithmetic or logical operations after execution. The various
control signal inputs are also shown. acc_rd_alu and acc_wr_alu are concerned with the
reading in and writing into of data from and to the ALU. The eni and eno signals control
the writing and reading of data from the databus. The ouputs are ‗q‘ which is connected
to the databus, and ‗alu_out‘ which is connected to the ALU directly into its ‗a‘ input.
The accumulator, like all registers and RAM of the processor, is operated with clock and
reset sensitivity, but is not negative edge triggered. This provision is provided to enable
the reading and writing of data into the accumulator in the same clock cycle. The
execution of instructions in a single cycle of clock depends greatly on this provision.
28
and also a high impedance buffer. The registers are addressed for input or ouput of data
directly from the control unit. The various input signals to the reg_bank are:
B_rd : in STD_LOGIC;
B_wr : in STD_LOGIC;
C_rd : in STD_LOGIC;
C_wr : in STD_LOGIC;
D_rd : in STD_LOGIC;
D_wr : in STD_LOGIC;
clk : in STD_LOGIC;
r : in STD_LOGIC;
d : in STD_LOGIC_VECTOR (7 downto 0);
The 8 bit data input ‗d‘ is common to all registers, and is connected to the databus. The
input control signals, besides the clk(clock) and r(reset) are the read and write enable
signals for the individual registers. They are identified using the names itself easily. The
output signals are ‗qb‘, ‗qc‘ and ‗qd‘. They are 8 bits each, and are also latched and
connected to the databus.
29
r: in std_logic;
clk: in std_logic;
en: in std_logic;
The ‗a‘ input is used in addressing, and the ‗rd‘ (read), ‗wr‘ (write) and ‗en‘ (enable) are
used to control the reading out and writing in of data during program execution.
30
4.2.6 Program Counter
The program counter is used in addressing the program memory to fetch the
instructions. The program counter is an 8 bit counter with synchronised load and reset.
The counter is cleared upon reset. The counter outputs the current value of the count, and
increments it upon application of the ‗count‘ high signal. The counter also has a ‗load‘
signal. The load signal is used to load the address of instruction location in the ROM
during jumping operations. The load signal causes the internal signal to inherit the load
value from the ‗immediate‘ field of the instruction word, and the address is output from
the counter on the next ‗count‘ signal high. The program counter is negative edge
triggered. The program counter is checked for working and timing accuracy as the proper
functioning of the counter is central to the timing of the processor instruction execution.
4.2.7 Databus
In computer architecture, a bus is a subsystem that transfers data between
computer components inside a computer or between computers. Unlike a point-to-point
connection, a bus can logically connect several peripherals over the same set of wires.
Each bus defines its set of connectors to physically plug devices, cards or cables
together.Early computer buses were literally parallel electrical buses with multiple
connections, put the term is now used for any physical arrangements that provides the
same logical functionality as a parallel electrical bus. Modern computer buses can use
both parallel and bit serial connections and can be wired in either a multi-drop (electrical
parallel) or daisy chain topology, or connected by switched hubs, as in the case USB.
At one time, ―bus‖ meant an electrically parallel system, with electrical
conductors similar or identical to the pins on the CPU. This is no longer the case, and
modem system is blurring the lines between buses and networks. Buses can be parallel
buses, which carry data words parallel on multiple wires, or serial buses, which carry data
in bit-serial form. The addition of extra power control connection, differential drivers and
data connections in each direction usually means that most serial buses have more
conductors than the minimum of one used in the I-Wire serial bus. As data rates increase,
the problems of timing skew, power consumption, electromagnetic interference and cross
talk across parallel buses become more and more difficult to circumvent. One partial
31
solution to this problem has been double pump the bus. Often, a serial bus can actually be
operated at high overall data rates than a parallel bus, despite having fewer electrical
connections, because a serial bus inherently has no timing skew or cross talk. USB, Fire
Wire and serial ATA are examples of this. Multi-drop connections do not work well for
fast serial buses, so most modern serial buses use daily-chain or hub designs.
Most computers have both internal and external buses. An internal bus connects
all the internal components of a computer to the mother board (and thus, the CPU and
internal memory). These types of buses are also referred to as a local bus, because they
are intended to connect to local devices, not to those in other machines or external to the
computer. An external bus connects external peripherals to the mother board. Network
connections such as Ethernet are not generally regarded as buses, although the difference
in largely conceptual rather than practical. The arrival of technologies such as InfiniBand
and HyperTransport is further blurring the boundaries between networks and buses.
Every lines between internal and external are sometimes fuzzy, PC can be used as both
internal bus or an external bus (where it is known as ACCESS bus), and InfiniBand is
replaced with internal buses like PCI as well as external ones like Fiber Channel.
The processor has an 8 bit databus. The databus is simply a signal, which can be
defined in VHDL as a global signal, or with the help of Xilinx ISE schematic editor, it
can simply be drawn as a connector. The databus terminates in the output port, and is
connected to all the registers and memory banks. Care must be taken while connecting
the databus to the various components, such that each component output and input is
latched, and is connected to the databus through a high impedance buffer. Lack of such a
buffer causes the value of the data in the bus to be ‗undefined‘ during simulation, due to
multiple sources. During simulation, the Xilinx tool automatically uses wired OR gates
while connecting the components to the databus, when it detects the presence of multiple
input and output terminals for the same signal.
32
size of addressable memory elements, determines how much memory can be accessed.
For example, a 16 bit wide address bus( commonly used in the 8 bit processors of the
1970s and early 1980s) reaches across 2=65,536=64kb memory location, where as a 32
bit address bus (common in PC processors as of 2004) can address
232=4,294,967,296=4GB location. In most microcomputers such addressable ―location‖
is 8 bytes. In such a case, the above examples translate to 64kibibytes (KiB) and 4
gigabytes (GiB) respectively.
The address bus of our processor is simply a global signal. It is defined using schematic
editor, just like the databus. There are 2 address buses for our processor which is based
on Harvard architecture. The program address bus is connected to the ROM or program
memory and the program counter, and the data address bus is connected between the
control unit and the RAM or data memory. Both the buses are 8 bits wide.
33
accomplished by an extra ―I/O‖ pin on the CPU‘s physical interface or an entire bus
dedicated to I/O.A device‘s direct memory access (DMA) is not affected by those CPU-
to-device communication methods; especially it is not affected by memory mapping. This
is because by definition, DMA is a memory-to-device communication method that
bypasses the CPU.
Hardware interrupt is yet another communication method between CPU and
peripheral devices. However, it is always treated seperately for a number of reasons. It is
device-initiated, as opposed to the methods mentioned above, which are CPU-initiated. It
is also unidirectional, as information flows only from device to CPU. Lastly, each
interrupt line carries only one bit of information with a fixed meaning namely ―there is an
interrupt‖.
The main advantage of using port-mapped I/O is on CPUs with a limited
addressing capability. Because port-mapped I/O seperates I/O access from memory
access, the full address space can be used for memory. It is also obvious to a person
reading an assembly language program listing (or even in rare instances analyzing
machine language) when I/O is being performed, due to special instructions that can only
be used for that purpose. The advantage of using memory-mapped I/O is that, by
discarding the extra complexity that port I/O brings, a CPU requires less internal logic
and is thus cheaper, faster, easier to build, consumes less power and can be physically
smaller; this follows the basic tenets of reduced instruction set computing and is also
advantageous in embedded systems. As 16-bit processors have become obsolete and
replaced with 32-bit and 64-bit in general use, reserving ranges of memory address space
for I/O is less of problem. The fact that regular memory instructions are used to address
devices also means that all of the CPUs addressing modes are available for the I/O as
well as the memory.
Memory-mapped I/O hogs the address and data buses as usually the mapped
device is slower than main memory. Port-mapped I/O doesn‘t, if it operates via a
dedicated I/O bus.
The processor has one output port. It is defined as an ouput marker connected to
the databus during final core design using schematic editor. The output port of the FPGA
34
is mapped during the implementation stage to the appropriate pins of the piggy back
board, for display using LEDs.
35
prg_count : out STD_LOGIC;
r_count : out STD_LOGIC;
acc_rd : out STD_LOGIC;
acc_wr : out STD_LOGIC;
B_rd : out STD_LOGIC;
B_wr : out STD_LOGIC;
C_rd : out STD_LOGIC;
C_wr : out STD_LOGIC;
D_rd : out STD_LOGIC;
D_wr : out STD_LOGIC;
and_enable:out std_logic;
or_enable:out std_logic;
not_enable:out std_logic;
xor_enable: out std_logic;
shiftleft_enable: out std_logic;
shiftright_enable:out std_logic;
add_enable:out std_logic;
subtract_enable:out std_logic;
multiply_enable:out std_logic;
compare_enable:out std_logic;
The control unit is sensitive to clock, and is negative edge triggered. The simulation and
testing of instruction decoding was done by giving different input instruction words to the
controller, and studying the output signals generated.
36
be added to our processor design than what we have currently developed. The
instructions are directly fed into the control unit, which decodes them according to the
format and type of the instruction word. The following figures give the instruction
formats of both types of instructions.
15 8 7 0
dest
opcode immediate
Type 1 instruction: The first type of instructions use both a source register and a
destination register. Each of these registers is specified using 2 bits of the instruction
word, since there are 4 registers including the accumulator. The source register is
specified in ins(11 downto 10) and destination register is specified in ins(9 downto 8).
The opcode of this type of instruction is only 4 bits long and is specified in ins(15 downto
12). Currently only one instruction is there in our processor of this type. The last 8 bits of
the instruction is the ‗immediate‘ data field.
Type 2 instructions: The second type of instructions has only the destination register to be
specified. The destination register is specified in ins(9 downto 8). The opcode in this type
37
of instruction is 5 bits long and specified in ins(15 downto 11). The last 8 bits are again
used in immediate data specification.
38
TABLE 4.1 INSTRUCTION SET
39
CHAPTER 5: SIMULATION AND TESTING
1. Design Entry
The first step is to enter y our design. This can be done by creating ―Source‖ files.
Source files can be created in different formats such as a schematic, or a Hardware
Description Language (HDL) such as VHDL, Verilog or ABEL. A project design will
40
consist of a top-level source file and various lower-level source files. Any of these files
can be either a schematic or a HDL file.
2. Design Synthesis
The synthesis step creates netlist files from the various source files. The netlist
files can serve as input to the implementation module.
3. Design Verification (simulation)
This is an important step that should be done at various stages of the design. The
simulator is used to verify the functionality of a design (functional simulation), the
behavior and the timing (timing simulation) of your circuit. Timing simulation is run
after implementing your circuit in the FPGA since it needs to know the actual
placement and routing to find out the exact speed and timing of the circuit.
4. Design Implementation
After generating the netlist file (synthesis step), the implementation will convert
the logic design into a physical file that can be downloaded on the target device
(e.g. Virtex FPGA). This steps involves three sub-steps: Translating the netlist,
Mapping and Place&Route.
41
Fig 5.2 Xilinx 10.1 Graphical Environment
5.2 XILINX Isim simulator / waveform editor
ISE Simulator / Waveform Editor can be used to create and simulate test bench
and test fixture within the Project Navigator framework. Waveform Editor can be used to
graphically enter stimuli and the expected response, then generate a VHDL test bench or
Verilog test fixture.
Creating a Test Bench Waveform Using the Waveform Editor:
To create a test bench with the ISE Simulator Waveform Editor:
Select time_cnt in the Sources tab.
i. Select Project > New Source.
ii. In the New Source Wizard, select Test Bench Waveform as the source type.
iii. Type time_cnt_tb.
iv. Click Next.
42
v. In the Select dialog box, the time_cnt file is the default source file because it
is selected int he Sources tab (step 1).
vi. Click Next.
vii. Click Finish.
The Waveform Editor opens in ISE. The Initialize Timing dialog box
displays, and enables to specify the timing parameters used during simulation.
The Clock Time High and Clock Time Low fields together define the clock
period for which the design must operate. The Input Setup Time field defines
when inputs must be valid. The Output Valid Delay field defines the time after
active clock edge when the outputs must be valid.
viii. In the Initialize Timing dialog box, the fields can be filled according to our
needs.
Given below is an example:
♦ Clock Time High: 10
♦ Clock Time Low: 10
♦ Input Setup Time: 5
♦ Output Valid Delay: 5
ix. Select the GSR (FPGA) from the Global Signals section.
x. 10. Change the Initial Length of Test Bench to 3000.
xi. 11. Click Finish.
Applying Stimulus
In the Waveform Editor, in the blue cell, we can apply a transition (high/low).
The width of this cell is determined by the Input setup delay and the Output valid delay.
Enter the following input stimuli:
1. Click the CE cell at time 110 ns to set it high (CE is active high).
2. Click the CLR cell at time 150 ns to set it high.
3. Click the CLR cell at time 230 ns to set it low.
4. Click the Save icon in the toolbar.
The new test bench waveform source (time_cnt_tb.tbw) is automatically added to
the project.
5. Select time_cnt_tb.tbw in the Sources tab.
43
6. Double-click Generate Self-Checking Test Bench in the Process tab.
A test bench containing output data and self checking code is generated and added
to the project. The created test bench can be used to compare data from later simulation.
44
2. Select Properties.
3. In the Project Properties dialog box, select ISE Simulator in the Simulator field.
45
test bench waveform (TBW) file and add the test bench to the project. You can also use
this process to update an existing self-checking test bench. The test bench generated by
this process contains output data and self-checking code that can be used to compare the
data from later simulation runs.
Specifying Simulation Properties
The behavioral simulation will be performed on the stopwatch design after setting
some process properties for simulation.ISE allow setting several ISE Simulator properties
in addition to the simulation netlist properties. To see the behavioral simulation
properties, and to modify the properties for this example:
1. In the Sources tab, select the test bench file (stopwatch_tb).
2. Click the + sign next to ISE Simulator to expand the hierarchy in the Processes
tab.
3. Right-click the Simulate Behavioral Model process.
4. Select Properties.
5. In the Process Properties dialog box set the Property display level to
Advanced. This global setting enables to see all available properties.
6. Change the Simulation Run Time to 2000 ns.
7. Click Apply and click OK.
The process properties window is shown in Fig 5.5
Performing Simulation
Once the process properties have been set, the ISE Simulator can be run. To start
the behavioral simulation, double-click Simulate Behavioral Model. ISE Simulator
creates the work directory, compiles the source files, loads the design, and performs
simulation for the time specified. The majority of this design runs at 100 Hz and would
take a significant amount of time to simulate. The first outputs to transition after RESET
is released are SF_D and LCD_E at around 33 mS. This is why the counter may seem
like it is not working in a short simulation. For the purpose of this tutorial, only the DCM
46
Fig 5.5 Process properties for ISE simulator
Adding Signals
To view signals during the simulation, you must add them to the Waveform
window. ISE automatically adds all the top-level ports to the Waveform window.
Additional signals are displayed in the Sim Hierarchy window. The following procedure
explains how to add additional signals in the design hierarchy. For the purpose of this
tutorial, add the DCM signals to the waveform.
To add additional signals in the design hierarchy:
1. In the Sim Hierarchy window, click the + next to stopwatch_tb to expand the
hierarchy.
2. Click the + next to uut stopwatch to expand the hierarchy
47
3. Click the + next to dcm_inst in the Sim Instances tab.
4. Click and drag CLKIN_IN from the Sim Objects window to the Waveform window.
5. Select the following signals:
♦ RST_IN
♦ CLKFX_OUT
♦ CLK0_OUT
♦ LOCKED_OUT
To select multiple signals, hold down the Ctrl key.
6. Drag all the selected signals to the waveform. Alternatively, right click on a selected
signal and select Add To Waveform.
By default, ISE Simulator records data only for the signals that have been added
to the waveform window while the simulation is running. Therefore, when new signals
are added to the waveform window, we must rerun the simulation for the desired amount
of time.
Analyzing the Signals
Now the DCM signals can be analyzed to verify that they work as expected. The
CLK0_OUT should be 50 MHz and the CLKFX_OUT should be ~26 MHz . The DCM
outputs are valid only after the LOCKED_OUT signal is high; therefore, the DCM
signals are analyzed only after the LOCKED_OUT signal has gone high.
ISE Simulator can add markers to measure the distance between signals. To measure the
CLK0_OUT:
1. If necessary, zoom in on the waveform.
2. Click the Measure Marker icon.
3. Place the marker on the first rising edge transition on the CLK0_OUT signal after the
LOCKED_OUT signal has gone high.
4. Click and drag the other end of the marker to the next rising edge.
5. Look at the top of the waveform for the distance between the markers. The
measurement should read 20.0 ns. This converts to 50 MHz, which is the input
frequency from the test bench, which in turn is the DCM CLK0 output.
6. Measure CLKFX_OUT using the same steps as above. The measurement should read
38.5 ns. This equals approximately 26 MHz.Now the behavioral simulation is complete
48
5.3 Simulation of components
The simulation of the components of the processor was first done
separately to check for operational timing accuracy. The simulation was done using the
clock period of 100ns, and the input and output setup times were put as 0ns. This 0ns
time delay in output and input availability, will prove to be a serious timing issue during
implementation stage, but for the simulation purpose and verification or working, we
have chosen this delay to be 0. The following section describes the simulation waveforms
of all the components. The simulation waveforms of the components and instructions are
given in APPENDIX B.
49
CHAPTER 6: APPLICATIONS IN INDUSTRY
i. As NGCP
A space vehicle can have a very complex motion which might seem much
difficult to explain. However the motion of any rigid body can be considered to be the
combination of translational and rotational motion. By considering the three dimensional
space a translational motion can be considered to be a movement which can be resolved
into components along one or more of the three axes. A rotation can be considered as a
rotation which has components rotating about one or more of the axes. To control all
these motions a special processor called a Navigation Guidance Control Processor. From
the name itself its quiet understood that it controls and guides the processor. The several
rotational and translational motion of the space probe at different altitudes will be
different. Its pre-programmed and controlled by a processor. There are certain
specifications for such control and guidance processors. It should be having well
controlled and very few number of interrupts compared to the commercial processors.
Even a single interrupt can lead to a mass destruction. The criteria for NGCP are:
1. All the processors should be of military standard MIL-STD-462
2. The interrupts have to be controlled
3. Enabling of all interrupts damages the machine
4. Only specified interrupts are allowed to work
5. Even though commercial processors have better speed and efficiency, they were
not considered since all the interrupts will be enabled in such processors.
One such NGCPs indegeniously developed by Vikram Sarabhai is
―VIKRAM‖ processor which is solely dedicated as NGCP. Another such a processor
is SAYEH(Simple All Yet Enough Hardware) processor which can be either used as
NGCP or FPGA or ASIC. But the primary need is as NGCP.
ii. As FPGA
Before the advent of programmable logic, custom logic circuits were built at the
board level using standard components or at the gate level in expensive application
specific (custom) integrated circuits. The FPGA is an integrated circuit that contains
50
many (64 to over 10000) identical logic cells that can be viewed as standard components.
Each logic cell can independently take on any one of a limited set of personalities. The
individual cells are interconnected by a matrix of wires and programmable switches. A
user‘s design is implemented by specifying the simple logic function for each cell and
selectively closing the switches in the interconnect matrix. The array of logic cells and
interconnect form a fabric of basic building blocks for logic circuits. Complex designs are
created by combining these basic blocks to create the desired circuit.
Field Programmable Gate Arrays are two dimensional array of logic blocks and
flip-flops with a electrically programmable interconnections between logic blocks. The
interconnections consist of electrically programmable switches which is why FPGA
differs from Custom ICs, as Custom IC is programmed using integrated circuit
fabrication technology to form metal interconnections between logic blocks. FPGAs can
be used to implement just about any hardware design. One common use is to prototype a
system that will eventually find its way into an ASIC.
FPGAs comprises an array of uncommitted circuit elements called combinational
logic blocks and interconnect resources, but FPGA configuration is performed through
programming by the end user. FPGAs have been responsible for a major shift in a way
digital circuits are designed.
There are two basic categories of FPGAs in the market today :
1. SRAM based FPGAs
2. Anti fuse based FPGAs
The SRAM based FPGAs are multi-programmable where as anti fuse based FPGAs are
one time programmable.
Anti fuses are originally open circuits and take on low resistance only when
programmed. Anti fuses are suitable for FPGAs because it can be built using modified
CMOS technology. As an example, Actel‘s anti fuse structure, known as PLICE is
depicted in figure.Applications of FPGAs include digital signal processor (DSP),
software-defined radio, aerospace and defense systems, ASIC proto typing, medical
imaging, computer vision, speech recognition, cryptography, bio informatics, computer
hardware emulation and a growing range of other areas.
51
iii. As ASIC
An ASIC is an Application Specific Integrated Circuit. With the advent of VLSI
in the 1980s engineers began to realize the advantage of designing an IC that was
customized or tailored to a particular system or application rather than using standard ICs
alone. Microelectronic system design then becomes a matter of designing the functions
that you can implement using standard ICs and then implementing the remaining logic
functions with one or more custom ICs. Types of ASICs are:
1. Full custom ASIC
2. Semi custom ASIC
52
.CHAPTER 7: CONCLUSION
Microprocessors have evolved from the obsolete and crude calculating machines
they were at the time of their genesis, to highly capable and fast controllers, which can be
programmed to meet almost all the needs of this age of automated production and
maintenance. The task of designing a processor, has simplified down from drawing the
actual circuit by hand, to the use of HDL languages, and now the use of powerful
simulation and design tools like XILINX, making a processor has become much more of
an easier task.
We were able to successfully simulate all the arithmetic and logic as well as data
transfer instructions of the processor. In this process, the timing advantages offered by
the modified Harvard architecture, over the original Von-Neumann architecture were
significant. All the instructions took at the most two clock cycles to execute.
This project has been very intense and involved. Throughout the development, we
were able to encounter and understand the various issues involved with the designing of
an actual processor. The working intricacies of the various modules, the timing issues and
their solutions, were also understood by us. The main difficulties we faced were in the
timing front, were the working of the whole processor unit with one synchronizing clock
proved to be more complex and haphazardly than we contemplated. Although all the
timing issues were not solved, we were able to solve a few of the problems, using our
own techniques. Whether these solutions will work in a real world FPGA application
remains to be seen. Nevertheless, we were able to learn to use Xilinx tools for design,
simulation and implementation of a hardware model on an FPGA.
VLSI design is one of the leading industries in the semiconductor market today.
In this computer controlled world, almost everything and anything in the industrial
domain envisages the need for a processor for control. By designing this processor, we
were able to familiarize ourselves with the various stages involved in designing a custom
made IC for a user defined purpose.
53
APPENDICES
54
APPENDIX A: CPU CORE
55
APPENDIX B: SIMULATION WAVEFORMS
The value 95H is first written into A using MVI instruction, and then the contents of A
are moved into C using MOV C,A instruction.
56
2. LDA 0AH
The RAM address 0AH contains the value 12H. This value is loaded into the accumulator
using the instruction LDA 0AH.
57
3. STA 04H
The accumulator is first stored with the value 2A using MVI instruction. The contents of
A are then written into the RAM address 04H using the STA instruction.
58
4. LXI C, 04H
The RAM address 04H contains 0CH. It is stored into the C register using the LXI
instruction.
59
5. ADD A, C
03H is written into C register, 04H into A register, and then the contents of C are added
with A. Finally the contents of A are moved into B register.
60
6. SUB A, C
03H is written into C register, 04H into A register, and then the contents of C are
subtracted from A. Finally the contents of A are moved into B register.
61
7. MUL A, C
03H is written into C register, 04H into A register, and then the contents of C are
multiplied with A. Finally the contents of A are moved into B register.
62
8. AND A, C
03H is written into C register, 04H into A register, and then the contents of C are ANDed
with A. Finally the contents of A are moved into B register.
63
9. XOR A, C
03H is written into C register, 04H into A register, and then the contents of C are XORed
with A. Finally the contents of A are moved into B register.
64
10. CMA
Here, the accumulator is stored with the value 03H using MVI A,03H , and then the value
is complemented, giving the result FCH. The contents of A are then moved to B.
65
11. SLA
Here, the accumulator is stored with the value 03H using MVI A,03H , and then the value
is shifted to the left by 1, giving the result 06H. The contents of A are then moved to B.
66
12. SRA
Here, the accumulator is stored with the value 03H using MVI A,03H , and then the value
is shifted to the right by 1, giving the result 01H. The contents of A are then moved to B.
67
13. ORA C
03H is written into C register, 04H into A register, and then the contents of C are ORed
with A. Finally the contents of A are moved into B register.
68
REFERENCES
69