You are on page 1of 83

CHAPTER 1

INTRODUCTION
1.1

Introduction to Embedded Systems


An embedded system is a special-purpose computer system designed to perform one or a few

dedicated functions, often with real-time computing constraints. It is usually embedded as part of a
complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such
as a personal computer, can do many different tasks depending upon programming. Embedded systems
control many of the common devices in use today. Since the embedded system is dedicated to specific
tasks, design engineers can optimize it by reducing the size and cost of the product, or increasing the
reliability and performance. An embedded system can also be defined as an engineering artefact
involving computation that is subject to physical constraints arising through interactions of
computational processes with the physical world. These physical constraints are divided into reaction
and execution constraints. Reaction constraints originate from the behavioral requirements and specify
the deadlines, throughput and jitter whereas the execution constraints originate from the implementation
requirements and put bounds on available processor speeds, power, and memory and hardware failure
rates.
Some embedded systems are mass-produced, benefiting from economies of scale. In general,
"embedded system" is not an exactly defined term, as many systems have some element of
programmability. Physically, embedded systems range from portable devices such as digital watches and
MP4 players to large stationary installations like traffic lights, factory controllers, or the systems
controlling nuclear power plants/missiles/satellites. Complexity varies from low, with a single
microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large
chassis or enclosure.
Embedded systems range from no user interface at all, dedicated only to one task, to complex
graphical user interfaces that resemble modern computer desktop operating systems. Simple embedded
devices use buttons, LEDs, and small character-or digit-only displays, often with a simple menu system.
Embedded processors can be broken into two broad categories: ordinary microprocessors (P) and
microcontrollers (C), which have many more peripherals on chip, reducing cost and size.
A common configuration for very-high-volume embedded systems is the system on a chip
(SOC). A system on chip is an integrated circuit which contains a complete system consisting of
multiple processors, multipliers, caches and interfaces on a single chip. SOCs can be implemented as an
application-specific integrated circuit (ASIC) or using a field-programmable gate array (FPGA).
Page 1 of 83

1.1.1

Embedded System Characteristics

Embedded systems are designed to some specific task, rather than be a general-purpose
computer for multiple tasks. Some have real-time performance constraints that must be
met, for reasons such as safety and usability; others may have low or no performance
requirements, allowing the system hardware to be simplified to reduce costs.

Embedded systems are not always stand-alone devices. Many embedded systems consist
of small, computerized parts within a larger device that serves a more general purpose.
For example, the Gibson Robot Guitar features an embedded system for tuning the
strings; the overall purpose of the guitar is, of course to play music. Similarly, an
embedded system in an automobile provides a specific function as a subsystem of the car
itself.

The program instruction written for embedded systems are referred to as firmware, and
are stored in read-only memory or flash memory chips. They run with limited computer
hardware resources, little memory, small or non-existent keyboard and/or screen.

Embedded systems often reside in machines that are expected to run continuously for
years without errors and in some cases recover by them if an error occurs. Therefore the
software is usually developed and tested more carefully than for personal computers, and
unreliable mechanical moving parts such as hard drives, switches or buttons are avoided.

1.2 Introduction to Electronic Warfare


The term Electronic Warfare (EW) refers to any action involving the use of the electromagnetic
spectrum (EMS) or directed energy (DE) to control the EMS or to attack the enemy. EW includes three
major subdivision and they are: Electronic attack (EA), Electronic Protect (EP), and Electronic warfare
support (ES). The purpose of EW is to deny the opponent an advantage in the EMS and ensure friendly
unimpeded access to the EM spectrum portion of the information environment. EW can be applied from
air, sea, land, and space by manned and unmanned systems.

1.2.1 Description of EW

Page 2 of 83

The term Electronic attack (EA) refers to the usage of electromagnetic energy, directed energy, or
anti-radiation weapons to attack personnel, facilities, or equipment with the intent of degrading,
neutralizing, or destroying enemy combat capability. In case of EM energy, this action is referred to as
jamming and can be performed on communications systems or radar systems.
Electronic protect or Electronic protective measures (EPM) involves actions taken to protect
personnel, facilities and equipment from any effects of friendly or enemy use of electromagnetic
spectrum that degrade, neutralize or destroy friendly combat compatibility.
In military telecommunications, the terms Electronic Support (ES) or Electronic Support Measures
(ESM) describe the division of electronic warfare involving actions taken under direct control of an
operational commander to detect, intercept, identify, locate, record, and/or analyze sources of radiated
electromagnetic energy for the purposes of immediate threat recognition (such as warning that fire
control RADAR has locked on a combat vehicle, ship, or aircraft) or longer-term operational planning.
Thus, Electronic Support provides a source of information required for decisions involving Electronic
Protection (EP), Electronic Attack (EA), avoidance, targeting, and other tactical employment of forces.
Electronic Support data can be used to produce signals intelligence (SIGINT), communications
intelligence (COMINT) and electronics intelligence (ELINT).
Digital communication became important with the expansion of the use of computers and data
processing and had continued to grow as a major industry providing the inter connection of computer
peripherals and transmission of data between distant sites. With the requirement of higher and higher
speeds of data transmission, the stress on the development of digital communication techniques has
increased, Also, the channel and its characteristics bandwidth, frequency, noise, distortion, transmission
speed, type of coding etc. got improved from time to time.
Electronic Support Measures gather intelligence through passive "listening" to electromagnetic
radiations of military interest. Electronic support measures can provide.
1.

Initial detection or knowledge of foreign systems.

2.

A library of technical and operational data on foreign systems.

3.

Tactical combat information utilizing that library.

Desirable characteristics for electromagnetic surveillance and collection equipment include.

Page 3 of 83

1. Wide-spectrum

or

bandwidth

capability

because

foreign

frequencies

are

initially

unknown.
2. Wide dynamic range because signal strength is initially unknown.
3.

Narrow band pass to discriminate the signal of interest from other electromagnetic radiation on
nearby frequencies.

4. Good angle-of arrival measurement for bearings to locate the transmitter.

1.2.2 Electronic Counter Measures


Electronic Counter Measures (ECM) are a subsection of electronic warfare which includes any
sort of electrical or electronic device designed to trick or deceive Radar, Sonar, or other detection
systems like IR (infrared) and Laser. It may be used both offensively and defensively in any method to
deny targeting information to an enemy. The system may make many separate targets appear to the
enemy, or make the real target appear to disappear or move about randomly. It is used effectively to
protect aircraft from guided missiles. Most air forces use ECM to protect their aircraft from attack. That
is also true for military ships and recently on some advanced tanks to fool laser/IR guided missiles.
Frequency is coupled with stealth advances so that the ECM system has an easier job. Offensive ECM
often takes the form of jamming. Defensive ECM includes using blip enhancement and jamming of
missile terminal homers.

1.2.3 Electronic Counter-Counter Measures


Electronic Counter-Counter Measures (ECCM) describes a variety of practices which attempt to
reduce or eliminate the effect of Electronic Counter Measures (ECM) on electronic sensors aboard
vehicles, ships and aircraft and weapons such as missiles. ECCM is also known as Electronic Protective
Measures (EPM), chiefly in Europe. Electronic Protection (EP) involves actions taken to protect
personnel, facilities, and equipment from any effects of friendly or enemy use of the electromagnetic
spectrum that degrade, neutralize, or destroy friendly combat capability. While defensive EA actions and
EP both protect personnel, facilities, capabilities, and equipment, EP protects from the effects of EA
(friendly and/or adversary). Some examples of EPM are ECM detection, Pulse compression by
"chirping", or linear frequency modulation, Frequency hopping, Side lobe cancellation, Polarization and
Radiation homing.

1.3

Field Programmable Gate Array


Page 4 of 83

A Field Programmable Gate Array (FPGA) is a semiconductor device that can be configured by
the customer or designer after manufacturing hence the name ''field-programmable". FPGAs are
programmed using a logic circuit diagram or a source code in a hardware description language (HDL) to
specify how the chip will work. They can be used to implement any logical function that an application
specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping
offers advantages for many applications. FPGAs contain programmable logic components called "logic
blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired togethersomewhat like a one-chip programmable breadboard logic blocks can be configured to perform complex
combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic
blocks also include memory elements, which may be simple flip-flops or more complete blocks of
memory.
The cost of an FPGA design is much lower than that of an ASIC (although the ensuing ASIC
components are much cheaper in large production runs). At the same time, implementing design changes
is much easier in FPGAs, and the time-to-market for such designs is much faster. FPGAs are often used
to prototype ASIC designs or to provide a hardware platform on which to verify the physical
implementation of new algorithms. However, their low development cost and short time-to-market mean
that they are increasingly finding their way into final products (some of the major FPGA vendors
actually have devices that they specifically market as competing directly against ASICs).

Field Programmable Gate Array

Fig. 1.1: FPGA Introduction


In order to be programmable, we need some mechanism that allows us to configure (program) a
prebuilt silicon chip.
1.3.1

FPGA Origin
Around the beginning of the I980s, it became apparent that there was a gap in the digital IC

continuum. At one end, there were programmable devices like SPLDs and CPLDs, which were highly
configurable and had fast design and modification times, but which couldn't support large or complex
functions. At the other end of the spectrum were ASICs. These could support extremely large and

Page 5 of 83

complex functions, but they were painfully expensive and time-consuming to design. Furthermore, once
a design had been implemented as an ASIC it was effectively frozen in silicon.

PLDS

ASICs

SPLDS

The
Gap

CPLDS

Gate Arrays

Structured ASICs*
Full Custom
Standard Cell
Fig. 1.2: The Gap between PLDs and ASICs

The early devices were based on the concept of a programmable logic block, which comprised a
3-input lookup table (LUT), a register that could act as a flip-flop or a latch, and a multiplexer, along
with a few other elements that are of little interest here.

a
b
c

3-input
LUT

Mux
flip-flop

y
q

d
clock

Fig. 1.3: The key elements forming a simple programmable logic block
Each FPGA contained a large number of these programmable logic blocks, as discussed below.
By means of appropriate SRAM programming cells, every logic block in the device could be configured
to perform a different function. Each register could be configured to initialize containing logic 0 or logic
1 and to act as a flip-flop (as shown in Fig: 1.3) or a latch. If the flip-flop option were selected, the
register could be configured to be triggered by a positive-or negative-going clock (the clock signal was
common to all of the logic blocks). The multiplexer feeding the flip-flop could be configured to accept
Page 6 of 83

the output from the LUT or a separate input to the logic block, and the LUT could be configured to
represent any 3-input logical junction.
1.3.2FPGA Architecture
The complete FPGA comprised of a large number of programmable logic block called "islands"
surrounded by a "sea" of programmable interconnects. High-level illustration is merely an abstract
representation. All of the transistors and interconnects would be implemented on the same piece of
silicon using standard IC creation techniques. In addition to the local interconnect reflected in figure,
there would also be global (high-speed) interconnection paths that could transport signals across the chip
without having to go through multiple local switching elements. The device would also include primary
I/O pins and pods. By means of its own SRAM cells, the interconnect could be programmed such that
the primary inputs to the device were connected to the inputs of one or more programmable logic blocks,
and the outputs from any logic block could be used to drive the inputs, the primary outputs from the
device, or both.

Fig. 1.4: Top-down view of simple, generic FPGA architecture


The end result was that FPGAs successfully bridged the gap between PLDs and ASICs and also
they were highly configurable and had the fast design and modification times associated with PLDs. On
the other hand, they could be used to implement large and complex functions that had previously been
the domain only of ASICs (which were still required for the really large, complex, high-performance
designs), but as FPGAs increased in sophistication they started to encroach further and further into ASIC
design space.

1.4

XilinxTMI Virtex-5 FPGA


Page 7 of 83

Virtex-I is the newest generation FPGA from Xilinx. Virtex-5 family contains five distinct
platforms, the most choice offered by any FPGA family. Each platform contains a different ratio of
features to address the needs of a wide variety of advanced logic designs. In addition to the most
advanced, high performance logic fabric, Virtex-5 FPGAs contain many hard-IP system level blocks,
including powerful 36-Kbit block RAM/FIFOs, second generation 25*18 DSP slices. Also Virtex-5
offers the best solution for addressing the needs of high performance logic designers, high performance
DSP designers, and high performance embedded systems designers with unprecedented logic, DSP,
hard/soft microprocessor and connectivity capabilities. The Virtex-5 LX, LXT, SXT, TXT and FXT
platforms include high speed serial connectivity and link/transaction layer capability.
The 5 platforms are:
Virtex-5 LX:

High performance general logic applications.

Virtex-5 LXT: High performance logic with advanced serial connectivity.


Virtex-5 SXT: High performance signal processing applications with advanced serial connectivity.
Virtex-5 TXT: High performance system with double density advanced and serial connectivity.
Virtex-5 FXT: High performance embedded systems with advanced serial connectivity.
1.4.1

Architectural Description
Virtex-5 devices are user-programmable gate arrays with various configurable elements and
embedded cores optimized for high-density and high-performance system designs. Virtex-5
devices implement the following functionality:
I/O blocks provide the interface between package pins and the internal configurable logic.
Most popular and leading-edge I/O standards are supported by programmable I/O blocks
(IOBs). The IOBs can be connected to very flexible Chip Sync logic for enhanced sourcesynchronous interfacing. Source-synchronous optimizations include per-bit deskew (on both
input and output signals), data serializers or deserializers, clock dividers, and dedicated I/O
and local clocking resources.
Configurable Logic Blocks (CLBs), the basic logic elements for Xilinx FPGAs, provide
combinatorial and synchronous logic as well as distributed memory and SRL32 shift register
capability. Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and
provide superior capabilities and performance compared to previous generations of
programmable logic.
Block RAM modules provide flexible 36 Kbit true dual port RAM that are cascadable to
form larger memory blocks. In addition, Virtex-5 FPGA block RAMs contain optional
Page 8 of 83

programmable FIFO logic for increased device utilization. Each block RAM can also be
configured as two independent 18 Kbit true dual-port RAM blocks, providing memory
granularity for designs needing smaller RAM blocks.
Clock Management Tile (CMT) blocks provide the most flexible, highest-performance
clocking for FPGAs. Each CMT contains two Digital Clock Manager (DCM) blocks (selfcalibrating, fully digital), and one PLL block (self-calibrating, analog) for clock distribution
delay compensation, clock multiplication/division, coarse- /fine-grained clock phase shifting,
and input clock jitter filtering.

1.4.2 Virtex-5 FPGA Features


Input/output Blocks (Select IO)
IOBs are programmable and can be categorized as

Programmable single-ended or differential (LVDS) operation.


Input block with an optional single data rate (SDR) or double data rate (DDR) register.
Output block with an optional SDR or DDR register
Bidirectional block
Per-bit de skew circuitry
Dedicated I/O and regional clocking resources
Built-in data serializer/deserializer
The IOB registers are either edge-triggered D-type flip-flops or level-sensitive latches.
The Digitally Controlled Impedance (DCI) I/O feature can be configured to provide on-chip
termination for each single-ended I/O standard and some differential I/O standards.

Data serializer/deserializer capability is added to every I/O to support source-synchronous


Interfaces. A serial-to parallel converter with associated clock divider is included in the input
path, and a parallel-to-serial converter in the output path.

Configurable Logic Blocks (CLBs)


A Virtex-5 FPGA CLB resource is made up of two slices. Each slice is equivalent and contains:
Four function generators
Four storage elements
Arithmetic logic gates
Large multiplexers
Fast carry look-ahead chain
The function generators are configurable as 6-input LUTs or dual-output 5-input LUTs. In addition, the
four storage elements can be configured as either edge-triggered D-type flip-flops or level sensitive
Page 9 of 83

latches. Each CLB has internal fast interconnect and connects to a switch matrix to access general
routing resources.

Block RAM
The 36 Kbit true dual-port RAM block resources are programmable from 32K x 1 to 512 x 72, in
various depth and width configurations.

In addition, each 36-Kbit block can also be configured to operate as two, independent 18- Kbit
dual-port RAM blocks. Each port is totally synchronous and independent, offering three readduring-write modes.

Block RAM is cascadable to implement large embedded storage blocks. Additionally, back-end
pipeline registers, clock control circuitry, built-in FIFO support, ECC, and byte write enable
features are also provided as options.

Global Clocking
The CMTs and global-clock multiplexer buffers provide a complete solution for designing highspeed clock networks. Each CMT contains two DCMs and one PLL. The DCMs and PLLs can
be used independently or extensively cascaded. Up to six CMT blocks are available, providing
up to eighteen total clock generator elements. Each DCM provides familiar clock generation
capability.
To generate de skewed internal or external clocks, each DCM can be used to eliminate clock
distribution delay. The DCM also provides 90, 180, and 270 phase-shifted versions of the
output clocks. Fine-grained phase shifting offers higher resolution phase adjustment with fraction
of the clock period increments. Flexible frequency synthesis provides a clock output frequency
equal to a fractional or integer multiple of the input clock frequency.
To augment the DCM capability, Virtex-5 FPGA CMTs also contain a PLL. This block provides
reference clock jitter filtering and further frequency synthesis options. Virtex-5 devices have 32
global-clock MUX buffers. The clock tree is designed to be differential. Differential clocking
helps reduce jitter and duty cycle distortion.

DSP48E Slices
DSP48E slice resources contain a 25 x 18 twos complement multiplier and a 48-bit adder/subs
tractor/accumulator. Each DSP48E slice also contains extensive cascade capability to efficiently
implement high-speed DSP algorithms.

Routing Resources
All components in Virtex-5 devices use the same interconnect scheme and the same access to the global
routing matrix. In addition, the CLB-to-CLB routing is designed to offer a complete set of connectivity
in as few hops as possible. Timing models are shared, greatly improving the predictability of the
performance for high speed designs.

Page 10 of 83

Configuration
Virtex-5 devices are configured by loading the bit stream into internal configuration memory using one
of the following modes:
Slave-serial mode
Master-serial mode
Slave Select MAP mode
Master Select MAP mode
Boundary-Scan mode (IEEE-1532 and -1149)
SPI mode (Serial Peripheral Interface standard Flash)
BPI-up/BPI-down modes (Byte-wide Peripheral interface standard x8 or x16 NOR Flash)

System Monitor
FPGAs are an important building block in high availability/reliability infrastructure. Therefore,
there is need to better monitor the on-chip physical environment of the FPGA and its immediate
surroundings within the system.

For the first time, the Virtex-5 family System Monitor facilitates easier monitoring of the FPGA
and its external environment. Every member of the Virtex-5 family contains a System Monitor
block.

The System Monitor is built around a 10-bit 200kSPS ADC (Analog-to-Digital Converter). This
ADC is used to digitize a number of on-chip sensors to provide information about the physical
environment within the FPGA. On-chip sensors include a temperature sensor and power supply
sensors. Access to the external environment is provided via a number of external analog input
channels. These analog inputs are general purpose and can be used to digitize a wide variety of
voltage signal types.

Support for unipolar, bipolar, and true differential input schemes is provided. There is full access
to the on-chip sensors and external channels via the JTAG TAP, allowing the existing JTAG
infrastructure on the PC board to be used for analog test and advanced diagnostics during
development or after deployment in the field.

The System Monitor is fully operational after power up and before configuration of the FPGA.
System Monitor does not require an explicit instantiation in a design to gain access to its basic
functionality. This allows the System Monitor to be used even at a late stage in the design cycle

1.4.3 Virtex-5 Ordering Information

XC5VFX100T-1FFG1738
Pin count
Xilinx
Lead free

Virtex 5
Embedded
Power
Processor

Flip Chip
Page 11 of 83

Speed
Logical
capacity

CHAPTER 2
QDR-II STATIC RAM
2.1 Introduction to Memories
Computer data storage, often called storage or memory, refers to computer components, devices,
and recording media that retain digital data used for computing for some interval of time. Computer data
storage provides one of the core functions of the modern computer, that of information retention.
Memory is directly accessible to CPU. The CPU continuously reads instructions stored there and
executes them as required. Any data actively operated on is also stored there in uniform manner. This
memory is mainly of two types RAM and ROM.

2.1.1 Random Access Memory


Random access memory (RAM) is a form of computer data storage. It takes the form of
integrated circuits that allows the stored data to be accessed in any order (i.e., at random). The word
random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its
physical location and whether or not it is related to the previous piece of data. This contrasts with
storage mechanisms such as tapes, magnetic discs and optical discs, which rely on the physical
movement of the recording medium or a reading head. In these devices, the movement takes longer than
the data transfer, and the retrieval time varies depending on the physical location of the next item. The
word RAM is mostly associated with volatile types of memory, where the information is lost after the
power is switched off.
Modern types of writable RAM generally store a bit of data in either the state of a flip-flop, as in
SRAM (static RAM), or as a charge in a capacitor (or transistor gate), as in DRAM (dynamic RAM),
EPROM, EEPROM and Flash. Some types have circuitry to detect and/or correct random faults called
memory errors in the stored data, using parity bits or error correction codes. RAM of the read-only type.
As both SRAM and DRAM are volatile, other forms of computer storage, such as disks and magnetic
tapes, have been used as persistent storage in traditional computers.

2.1.2 Read Only Memory


Read-only memory (usually known by its acronym, ROM) is a class of storage media used in
computers and other electronic devices. Because data stored in ROM cannot be modified (at least not
very quickly or easily), it is mainly used to distribute firmware (software that is very closely tied to
Page 12 of 83

specific hardware, and unlikely to require frequent updates). ROM is fabricated with the desired data
permanently stored in it, and thus can never be modified. However, more modern types such as EPROM
and flash EEPROM can be erased and re-programmed multiple times; they are still described as "readonly memory" (ROM) because the reprogramming process is generally infrequent, comparatively slow,
and often does not permit random access writes to individual memory locations. There are different
types of ROM Classic mask programmed ROM chips are integrated circuits that physically encode the
data to be stored, and thus it is impossible to change their contents after fabrication
1.

Programmable read-only memory (PROM), or one-time programmable ROM (OTP), can be

written to or programmed via a special device called a PROM programmer. Typically, this device uses
high voltages to permanently destroy or create internal links (fuses or anti fuse) within the chip.
Consequently, a PROM can only be programmed once.
2.

Erasable programmable read-only memory (EPROM) can be erased by exposure to strong

ultraviolet light (typically for 10 minutes or longer), then rewritten with a process that again requires
application of higher than usual voltage. Repeated exposure to UV light will eventually wear out an
EPROM, but the endurance of most EPROM chips exceeds 1000cycles of erasing and reprogramming.
EPROM chip packages can often be identified by the prominent quartz "window" which allows UV light
to enter. After programming, the window is typically covered with a label to prevent accidental erasure.
Some EPROM chips are factory erased before they are packaged, and include no window: these are
effectively PROM.
3.

Electrically erasable programmable read-only memory (EEPROM) is based on a similar

semiconductor structure to EPROM but allows its entire contents (or selected banks) to be electrically
erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3
player, etc). Writing or flashing an EEPROM is much slower (milliseconds per bit) than reading from a
ROM or writing to a RAM (nanosecond in both cases).
4.

Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified

one bit at a lime. Writing is a very slow process and again requires higher voltage (usually around 12V)
than is used for read access. EAROMs are intended for applications that require infrequent and only
partial rewriting. EAROM may be used as non-volatile storage for critical system setup information; in
many applications, EAROM has been supplanted by CMOS RAM supplied by mains power and backedup with a lithium battery.
5.

Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory

can be erased and rewritten faster than ordinary EEPROM, and newer designs feature very high
endurance (exceeding 1,000,000 cycles). Modern NAND flash makes efficient use of silicon chip area,
resulting in individual ICs with a capacity as high as 16 GB as of 2007; this feature, along with its
endurance and physical durability, has allowed NAND flash to replace magnetic in some applications
Page 13 of 83

(such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used
as a replacement for older ROM types, but not in applications that take advantage of its ability to be
modified quickly and frequently.

2.2 Introduction to QDRII SRAM


The QDR consortium (Cypress, Renesas, IDT, NEC, and Samsung) defined and developed the
Quad Data Rate (QDR) SRAM technology for high-performance communications applications. The
QDRII SRAM architecture provides dedicated input and output ports that independently operate at
double data rate (DDR). This results in four data transfers per clock cycle and overcomes bus contention
issues. QDR SRAM devices were developed in response to the demand jar higher bandwidth memories
targeted at networking and telecommunications applications.
The basic QDR architecture has independent read and write data paths for simultaneous
operation. Both paths use Double Data Rate (DDR) transmission to deliver two words per clock cycle,
one word on the rising clock edge and another on the falling edge. The result is that four bus-widths of
data (two read and two write) are transferred during each clock period, hence the name quad data rate.
QDR memory devices are offered in both 2-word burst and 4word burst architectures. The 2-word burst
devices transmit two words per read or write request. A DDR address bus is used to allow Read requests
during the first half of the clock period and Write requests during the second half of the clock period. In
contrast, 4-word burst devices transmit four words per Read or Write request, and hence only require a
Single Data Rate (SDR) address bus to maximize data bandwidth. Read and Write operations must be
requested on alternating clock cycles (i.e., on-overlapping), allowing the address bus to be shared.
One of the unique features of the QDRII architecture is the echo-clock (CQ) output that is
frequency locked to the device input clock (K) but edge aligned to the data transmitted on the Read path
outputs (Q). The CQ clock output is retimed to align with the Q data outputs using a delay-locked loop
(DLL) circuit internal to the QDRII memory device. This clock forwarding, or source-synchronous,
method of interface allows greater timing margin. It also enables the simple and elegant direct-clocking
methodology used in this reference design, discussed in detail in this application note. The QDRII
reference design is composed of four main elements:
I.
II.

User Interface
Physical Interface

III.

Read/Write State Machine

IV.

Delay Calibration State Machine

Page 14 of 83

The user interface uses a simple protocol Based entirely on SDR signals to make Read/Write
requests. This module is constructed primarily from FIFO16 primitives and is used to store the address
and data values for Read/Write operations before and after execution.
The Read/Write state machine is responsible for monitoring the status of the First in first out
(FIFO) within the user interface module, coordinating the flow of data between the user interface and
physical interface, and initializing the actual Read/Write commands to the external memory device. It
ensures execution of Read/Write operations with minimal latency in a concurrent manner as per the
requirements of the QDR II memory specification.
The physical interface is responsible for generating the proper timing relationships and DDR
signaling to communicate with the external memory device in a manner that conforms to its command
protocol and timing requirements.
The delay calibration state machine is an integral component of the direct-clocking methodology
used to achieve maximum performance while greatly simplifying the task of read data capture inside the
FPGA. The delay calibration state machine leverages this unique capability to adjust the timing of the
read data returning from the memory device so that it can be synchronized directly to the global FPGA
system clock without any complex local-clocking or data recapture techniques.
The reference diagram of QDR-II is shown below as follows.

User Interface

Read/Write State
Machine

USER_CLK0
USER_RESET
USER_W_n
USER_R_n
USER_QEN_n
USER_AD_WR
USER_AD_RD
USER_BW_n
USER_DWL
USER_DWH
USER_QRL
USER_QRH

Physical Interface
USER_CLK0
USER_CLK270
USER_RESET

FIFO Status
Read/Write Control
Address path
Write Path
Read Path

USER_WR_FULL
USER_RD_FULL
USER_QR_EMPTY

QDR_W_n
QDR_R_n
QDR_SA

QDR_BW_n
QDR_D
QDR_CD
QDR_D
QDR_K
QDR_K_n

Delay

CLK_DIV4
Calibration
State Machine

Page 15 of 83

QDRII
Memory

Device

Fig. 2.1: QDR II Reference Design

2.3 Implementation of QDRII SRAM with Virtex-4 PRO FPGA


The QDR II reference design was implemented to take advantage of the unique
capabilities of the Virtex-4 family. Advances in I/O, clocking, and storage element technology
enable the high-performance, turnkey operation of this design. The following sections describe
the design implementation in further detail.
2.3.1 User Interface
The user interface module utilizes six FIFO16 blocks to store the address and data values
for Read/Write operations. For Write commands, three FIFO16 blocks are used, one to store the
Write address (USER_AD_WR) and byte write enable (USER_BW_n) signals, and two to store
the Low (USER DWL) and High (USER DWH) 36-bit data words to be written to the memory.
Read commands also use three FIFO16 blocks, one to store the Read address (USER_AD_RD)
and two to store the Low (USER_QRL) and High (USER_QRH) 36-bit data words returning
from the memory as a result of the Read execution. The Read/Write state machine manages the
interleaving of Read and Write requests to the external memory device, relieving the user
interface of this responsibility.

2.3.2 Read/Write State Machine


This state machine is responsible for coordinating the flow of data between the user
interface and physical interface. It initiates the Read/Write commands to the external memory
device based on the requests stored in the user interface FIFOs.
A USER_RESET always returns the state machine to the INIT state; where memory
operations are suspended until the delay calibration state machine has completed adjusting the
delay on the IDELAY blocks for all of the QDR_Q inputs to center align the Read path data to
the FPGA system clock, USER_CLK0. Completion of the calibration operation is signaled by an
active-High DLY_CAL_DONE input that transitions the Read/Write state machine to the Idle
state to await Read/Write requests from the user interface. From the Idle state, Write commands
take precedence on the presumption that a Write to memory must always occur before there is
any valid Read data. When there are no Read or Write requests pending, the stale machine loops
in the Idle state.
A Write request pending in the user interface FIFOs causes transition to the Write stale where a
Write command is initiated via the internal WR_INIT_n strobe. This strobe pulls the Write address and
data values from the FIFO and results in the initiation of the external QDR_W_n Write control strobe to
Page 16 of 83

the memory device. Assuming there is a pending Read request, the state machine then transitions to the
Read state where the internal RD_INIT_n strobe is activated. This strobe pulls the Read address from

INIT
USER_RESET

START_CAL=1

DLY_CAL_DONE
(FIFO_WR_EMPTY, FIFO_RD_EMPTY) |
(FIFO_WR_EMPTY FIFO _ RD _ EMPTY ,FIFO_QR_FULL)

IDLE

FIFO_WR_EMPTY
FIFO_WR_EMPTY

(FIFO_WR_EMPTY FIFO _ RD _ EMPTY ,

FIFO _ QR _ FULL )

FIFO_RD_EMPTY | FIFO_QR_FULL
FIFO_WR_EMPTY

WRITE
WR_INIT_n = 0

READ
RD_INIT_n=0
FIFO _ RD _ EMPTY

FIFO _ QR _ FULL

Figure2.2: 4-Word Burst Read/Write State Machine

DLY_CAL_DONE

USER_RESET

INIT
START_CAL=1

DLY_CAL_DONE

(FIFO_WR_EMPTY FIFO_RD_EMPTY) |
(FIFO_WR_EMPTY FIFO_QR_FULL

IDLE
/FIFO_WR_EMPTY |
(/FIFO_RD_EMPTY /FIFO_QR_FULL)

(FIFO_WR_EMPTY FIFO_RD_EMPTY) |
FIFO_WR_EMPTY. FIFO_QR_FULL

READ/WRITE
WR_INIT_n = 0?
rd_init_n = 0?

Page 17 of 83

/FIFO_WR_EMPTY) |
(/FIFO_RD_EMPTY /FIFO_QR_FULL)

Fig. 2.3: A 2-word burst read/write state machine

the FIFOs and launches an external QDR_R_n strobe to the memory device. Capture of the return values
in the Read data FIFOs also occurs as a result of this process.
The Read/Write slate machine continuously monitors the user interface FIFO status signals to
determine if there are any pending Read/Write requests. A continuous flow of concurrent Read/Write
requests causes the state machine to simply alternate between the Read and Write states, ensuring
properly interleaved requests to the external memory. A stream of Write requests results in alternating
Idle and Write stales. While a stream of Read requests similarly alternates between Idle and Read slates.
The operation of a 2-word burst state machine is quite similar to the 4-word burst slate machine,
with the exception that a single READ_IVRITE state manages the Read and Write requests to the
memory. All 2-word burst QDR 11 memory devices allow Read and Write requests to occur on the same
clock cycle, allowing these operations to be initialed from the same state.
The state diagram for 4 word burst read/write and 2 word burst read/write are shown below.
2.3.3 Physical Interface
The Physical Interface of the QDRII reference design generates the actual I/O signaling and
timing relationships for communication of Read/Write commands to the external memory device,
including the DDR data signals. It provides the necessary timing margins and 1/0 signaling standards
required to meet the overall design performance specifications.

2.4 Functional Description of QDRII SRAM


The CY7C15JJV18, CY7C1526V18, CY7C1513V18, andCY7C1515V18 are 1.8V
Synchronous Pipelined SRAMs, equipped with QDRII architecture. QDRII architecture consists of two
separate ports to access the memory array. The Read port has dedicated Data Outputs to support Read
operations and the Write Port has dedicated Data Inputs to support Write operations. QDRII architecture
has separate data inputs and data outputs to completely eliminate the need to "turn-around" the data bus
required with common I/O devices. Access to each port is accomplished through a common address bus.
Addresses jar Read and Write addresses are latched on alternate rising edges of the input (K) clock.
Accesses to the QDRII Read and Write ports are completely independent of one another. In order
to maximize data throughput, both Read and Write ports are equipped with Double Data Rate (DDR)
Page 18 of 83

interfaces. Each address location is associated with four 8-bit words (CY7CI5JlVI8) or 9-bit words
(CY7CI526VI8) or I8-bit words (CY7CI5I3VI8) or 36-bit words (CY7CI5I5VI8) that burst sequentially
into or out of the device. Since data can be transferred into and out of the device one very rising edge of
both input clocks (K and K and C and C), memory bandwidth is maximized while simplifying system
design by eliminating bus "turn-around"

Fig. 2.4:
Logic

diagram of CY7C1515V18
Depth expansion is accomplished with Port Selects for each port, Port selects allow each port to
operate independently. All synchronous inputs pass through input registers con/rolled by the K or K
input docks. All do/a outputs pass through output registers controlled by the C or C (or K or K in a
single clock domain) input docks, Writes ore conducted with on-chip synchronous self-timed write
circuitry.
2.4.1 Pin Definitions
Pin Name

D[x: 0]

I/O
InputSynchronous

InputSynchronous
,

InputSynchronous

Pin Description
Data input signals, sampled on the rising edge of K and
K clocks during valid write operations.
CY7C 1511V18-D[7:0]
CY7C 1526V18-D[8:0]
CY7C1513V18-D[17:0]
CY7CI515V18-D[35:0]
Write Port Select, active LOW. Sampled on the rising edge
of the K clock. When asserted active, a write operation is
initiated. Disserting will deselect the Write port. Deselecting
tile Write port Will cause D[x: 0] to be ignored.
Nibbl_Write Select 0, 1-active LOW. (CY7C1511V18
Only) Sampled on the rising edge of the K and K clocks
Page 19 of 83

during write operations. Used to select which nibble is


written into the device
controls D [3:0] and NSW 1
controls D [7:4] the entire Nibble write Selects are sample
on the same edge as the data. Deselecting a Nibble Write
Select will cause the corresponding nibble of data to be
ignored and not written into the device.
Byte write select 0, 1, 2 and 3-active low. Sampled on the
rising edge of the k and clocks during write operations. Used
to select which byte is written into the device during the
current portion to the write operations. Bytes not written
remain unaltered.
,

InputSynchronous

CY7C1526V18-

controls D[8:0]

CY7C1513V18-

controls D[8:0] and

controls

controls D[8:0],

controls

D[17:9]
CY7C1515V18D[17:9],

InputSynchronous

Q[x: 0]

OutputsSynchronous

RPS

InputSynchronous

controls D[26:18],

controls D[35:27]

Address Inputs. Sampled on tile rising edge of the K clock


during active read and write operations. These address inputs
are multiplexed for both Read and Write operations.
Internally, the device is organized as 8M x 8 (4 arrays each
or 2M x 8) for CY7C151W18, 8M x 9 (4 arrays each of 2M
x 9) for CY7C1526V18, 4M x 18(4 arrays each of 1M x 18)
for CY7C1513V18 and 2M x 36 (4 arrays each or 512K x
36) for CY7C 1515V18. Therefore, only 21 address inputs
are needed to access the entire memory array of CY7C
1511Vl8 and CY7C1526V18, 20 address Inputs for
CY7C1513V18
and
19
address
inputs
for
CY7C1515V18.These inputs are ignored when the
appropriate port is deselected
Data Output signals. These pins drive out the requested
data during a Read operation. Valid data is driven out on the
rising edge of both the C and C clocks during Read
operations or K and K. when in single clock mode, When
the Read port is deselected, Q[x: 0] are automatically tristated.
CY7C1511V18 -Q[7:0]
CY7C1525V18 -Q[18:0]
CY7C1513V18-Q[17: 0]
CY7C1515V18-Q[35:0]
Read Port Select, active LOW. Sampled on the rising edge
of Positive Input Clock (K). When active, a Read operation
is initiated. Deasserting Will cause the Read port to be
deselected. When deselected, the pending access is allowed
to complete and the output drivers are automatically triPage 20 of 83

Input-Clock

Input-Clock

Input-Clock

CQ

CQ

ZQ

DOFF

stated following the next rising edge of tile C clock. Each


read access consists of a burst of four sequential transfers.
Positive Input Clock for Output Data. C is used in
conjunction with C to clock out the Read data from the
device. C and C can be used together to deskew the flight
tunes of various devices on the board back to the controller.
See application example for further details
Negative Input Clock for Output data. C is used in
conjunction with C to clock out the Read data from the
device. C and C can be used together to deskew the flight
times or various devices on the board back to the controller.
See application example for further details
Positive Input Clock Input: The rising edge of k is used to
capture synchronous inputs to the device and to drive out
data through Q[x: 0] when in single clock mode. All
accesses are initiated on the rising edge of K.

Input-Clock

Negative Input Clock Input: K is used synchronous inputs


being presented to the devices and to drive out data through
Q[x: 0] when in single clock mode.

Echo Clock

CQ is referenced with respect to C. This is a free running


clock and is synchronized to the input clock for output data
(C) of the QDR-II. In the single clock mode. CQ is
generated with respect to K. The timings for the echo clocks
are shown in the AC timing table.

Echo Clock

CQ is referenced with respect to C : This is free running


clock and is synchronized to the Input clock for output data (
C ) of the QDR-II. In the Single clock mode. CQ is
generated with respect to K . The timings for the echo clocks
are shown in the AC Tuning table.

Input

Output Impedance Matching Input. Thus input is used to


turn the device outputs to the system data bus impedance.
CQ, CQ and Q[x: 0] output impedance are set to 0.2 x RQ,
where RQ is a resistor connected between ZQ and ground.
Alternately, this pin can be connected directly to VDDQ,
which enables the minimum impedance mode. This pin
cannot be connected directly to GND or left unconnected.

Input

DLL Turn Off- Active LOW. Connecting this pin to ground


will turn off the DLL inside the device. The timing in the
DLL turned off operation will be different from those listed
in this data sheet

TDO

Output

TCK

Input

TCK pin for JTAG.

TDI

Input

TOI pin for JTAG.


Page 21 of 83

TDO for JTAG.

TMS

Input

TMS pin for JTAG.

NC

N/A

Not connected to the die. Can be tied to any voltage level.

Vss/144M

Input

Vss/288M

Input

Vref

Input Reference

Address expansion for 144M. Can be tied to any voltage


level.
Address expansion for 288M. Can be tied to any voltage
level.
Reference Voltage Input. Static input used to set the
reference level for HSTL inputs and Outputs as well as AC
measurement points.

VDD

Power Supply

Power supply inputs to the core of the device.


Table 2.1: Pin definitions

2.5 Functioning Mechanism of QDRII SRAM


The CY7CI511V18, CY7C1526V18, CY7C1513V18, CY7C1515V18 are synchronous pipelined
Burst SRAMs equipped with both a Read Port and a Write Port. The Read port is dedicated to Read
operations and the Write Port is dedicated to Write operations. Data flows into the SRAM through the
Write port and out through the Read Port. These devices multiplex the address inputs in order to
minimize the number of address pins required. By having separate Read and Write ports, the QDRII
completely eliminates the need to" turn-around" the data bus and avoids any possible data contention,
thereby simplifying system design. Each access consists of four 8-bit data transfers in the case of
CY7C1511V18, four 9-bit data transfers in the case of C17CI526VI8, four 18-bit data transfers in the
case of CY7CI513VI8, and four 36-bit data in the case of C17C1515V18 transfers in two clock cycles.
Accesses for both ports are initiated on the Positive Input Clock (K). All synchronous input
timing is referenced from the rising edge of the input clocks (K and K) and all output timings referenced
to the output clocks (C and C or K and K when in single clock mode).
All synchronous data inputs (D[x:0]) inputs pass through input registers controlled by the input
clocks (K and K), All synchronous data outputs (Q[x:0]) outputs pass through output registers controlled
by the rising edge of the output clocks (C and C or K and K when in single-clock mode).
All synchronous control (RPS, WPS, BWS{x:O}) inputs pass through input registers controlled
by the rising edge of the input clocks (K and K).CY7CI513VI8 is described in the following sections.
2.5.1 Read Operations
The CY7CI513VI8 is organized internally as 4 arrays of 1M x18. Accesses are completed in a
burst of four sequential 18-bitdata words. Read operations are initiated by asserting RPS active at the
rising edge of the Positive Input Clock (K). The address presented to Address inputs is stored in the
Page 22 of 83

Read address register. Following the next K clock rise, the corresponding lowest order 18bit word of
data is driven onto the Q [17:0] using C as the output timing reference. On the subsequent rising edge of
C the next 18-bit data word is driven onto the Q [17:0]. This process continues until all four 18-bit data
words have been driven out onto Q [17:0]. The requested data will be valid 0.45 ns from the rising edge
of the output clock (C or C or (K or K when in single-clock mode)). In order to maintain the internal
logic, each read access must be allowed to complete. Each Read access consists of four 18-bit data
words and takes 2 clock cycles to complete. Therefore, Read accesses to the device cannot be initiated
on two consecutive clock rises. The internal logic of the device will ignore the second Read request
Read accesses can be initiated one very other K clock rise. Doing so will pipeline the data flow such that
data is transferred out of the device on every rising edge of the output clocks (C and C or K and K when
in single-clock mode).
When the read port is deselected, the CY7CI5I3VI8 will first complete the pending read
transactions. Synchronous internal circuitries will automatically tri-state the outputs following the next
rising edge of the Positive Output Clock (C). This will allow for a seamless transition between devices
without the insertion of wait states in a depth expanded memory.
2.5.2 Write Operations
Write operations are initiated by asserting WPS active at the rising edge of the Positive input
Clock (K). On the following K clock rise the data presented to D[I7:0] is latched and stored into the
lower I8-bit Write Data register, provided BWS[1:0] are both asserted active. On the subsequent rising
edge of the Negative Input Clock (K) the information presented to D [I7:0] also stored into the Write
Data Register, provided BWS [1:0] are both asserted active. This process continues for one more cycle
until four I8-bit 'words (a total of 72 bits) of data are stored in the SRAM. The 72 bits of data are then
written into the memory array at the specified location. Therefore, Write accesses to the device cannot
be initiated on two consecutive K clock rises. The internal logic of the device will ignore the second
Write request. Write accesses can be initiated on every other rising edge of the Positive Input Clock (K).
Doing so will pipeline the data flow such that 18bits of data can be transferred into the device on every
rising edge of the input clocks (K and K).
When deselected, the write port will ignore all inputs after the pending Write operations have
been completed.
2.5.3 Byte Write Operations
Byte Write operations are supported by the CY7CI 513VI8. A write operation is initiated as
described in the Write Operation section above. The bytes that are written are determined by BWS0 and
BWS1, which are sampled with each set of 18-bitdata words. Asserting the appropriate Byte Write
Select input during the data portion of a write will allow the data being presented to be latched and
Page 23 of 83

written into the device. Deasserting the Byte Write Select input during the data portion of a write 'will
allow the data stored in the device for that byte to remain unaltered. This feature can be used to simplify
Read/Modify/Write operations to a Byte Write operation. Even CY7C1515V18 also supports byte write
which is determined by BWS0, BWS1, BWS2, and BWS3.
2.5.4 Single Clock Mode
The CY7CJ513VI8 can be used with a single clock that controls both the input and output
registers. In this mode the device will recognize only a single pair of input clocks (K and K) that
controls both the input and output registers. This operation is identical to the operation if the device had
zero skew between the K/K and C/C clocks. All timing parameters remain the same in this mode. To use
this mode of operation, the user must tie C and C HIGH at power on. This function is a strap option and
not alterable during device operation.
2.5.4 Concurrent Transactions
The Read and Write ports on the CY7Cl5J3V18 operate completely independently of one
another. Since each port latches the address inputs on different clock edges, the user can Read or Write
to any location, regardless of the transaction on the other port. If the ports access the same location when
a read follows a write in successive clock cycles, the SRAM will deliver the most recent information
associated with the specified address location. This includes forwarding data from a Write cycle that was
initiated on the previous K clock rise.
Read accesses and Write access must be scheduled such that one transaction is initiated on any
clock cycle. If both ports are selected on the same K clock rise, the arbitration depends on the previous
state of the SRAM If both ports were deselected, the Read port will take priority, If a Read was initiated
on the previous cycle, the Write port will assume priority (since Read operations cannot be initiated on
consecutive cycles). If a Write was initiated on the previous cycle, the Read port will assume priority
(since Write operations cannot be initiated on consecutive cycles). Therefore, asserting both ports selects
active from a deselected state will result in alternating Read/Write operations being initiated, with the
first access being a Read
2.5.6 Depth Expansion
The CY7C1513V18 has a Port Select input for each port. This allows for easy depth expansion.
Both Port Selects are sampled on the rising edge of the Positive Input Clock only (K).Each port select
input can deselect the specified port. Deselecting a port will not affect the other port. All pending
transactions (Read and Write) will be completed prior to the device being deselected.
2.5.7 Programmable Impedance

Page 24 of 83

An external resistor, RQ, must be connected between the ZQ pin on the SRAM and VSS to allow
the SRAM to adjust its output driver impedance. The value of RQ must be 5X the value of the intended
line impedance driven by the SRAM, The allowable range of RQ to guarantee impedance matching with
a tolerance of 15% is between 175 and 350, with VDDQ = 1.5V. The output impedance is
adjusted every I024 cycles upon power up to account for drifts in supply voltage and temperature.

2.5.8 Echo Clocks


Echo clocks are provided on the QDR-II to simplify data capture on high speed systems. Two
echo clocks are generated by the QDR-II. CQ is referenced with respect to C and CQ is referenced with
respect to C. These are free running clocks and are synchronized to the output clock of the QDR-II. In
the single clock mode, CQ is generated with respect to K and CQ is generated with respect to K. The
timings for the echo clocks are shown in the AC liming table.
2.5.9 Delay Lock loops
These chips utilize a Delay Lock Loop (DLL) that is designed to function between 80 MHz and
the specified maximum clock frequency. During power-up, when the DOFF is tied HIGH, the DLL gels
locked after 1024 cycles of stable clock. The DLL canal so be reset by slowing or stopping the input
clock K and K for a minimum of 30 ns. However, it is not necessary for the DLL to be specifically reset
in order to lock the DLL to the desired frequency. The DLL will automatically lock 1024 clock cycles
after a stable clock is presented. The DLL may be disabled by applying ground to the DOFF pin. For
information refer to the application note "DLL Considerations in QDRIITM/DDRII/QDRII+/DDRII+.

Page 25 of 83

CHAPTER 3
XILINX AND MODEL SIM
3.1 Xilinx Overview
The Integrated Software Environment (ISE'"'V is the Xilinx design software suite that allows
you to take your design from design entry through Xilinx device programming. The ISE Project
Navigator manages and processes your design through the following steps in the ISE design flow.

3.2 Project Navigator Overview


Project Navigator organizes design files and runs processes to move the design from design entry
through implementation to programming the targeted Xilinx device. Project Navigator is the high-level
manager for Xilinx FPGA and CPLD designs, which allows doing the following:
1. Add and create design source files, which appear in the Sources window
2. Modify your source files in the Workspace
3. Run processes on your source files in the Processes window
4. View output from the processes in the Transcript window
Optionally, we can run processes from a script created or from a command line prompt.
However, it is recommended that we first become familiar with the basic use of the Xilinx Integrated
Software Environment (ISETM) software and with project management.
Project navigator main window is divided into four (4) types of sub windows, they are as follows:
1.
2.
3.
4.
5.

Tool bar
Sources window
Processes window
Workspace
Transcript window

From the figure below on the top left is the Sources window which hierarchically displays the
elements included in the project. Beneath the Sources window is the Processes window, which displays
Page 26 of 83

available processes for the currently selected source. The third window at the bottom of the Project
Navigator is the Transcript window which displays status messages, errors, and warnings and also
contains interactive tabs for Tcl scripting and the Find in Files function. The fourth window to the right
is a multi-document interface (MDI) window referred to as the Workspace. It enables you to view html
reports, ASCII text files, schematics, and simulation waveforms.
3.2.1 Project Navigator Main Window

Fig 3.1: Project Navigator Main Window

3.3 ISE Design flow


The ISE Project Navigator manages and processes the design through the following steps in the
ISE design flow.

3.3.1 Design Entry

Page 27 of 83

Design entry is the first step in the ISE design flow. During design entry, one creates the source
files based on design objectives. Also we can create the top-level design file using a Hardware
Description Language (HDL), such as VHDL, Verilog, or ABEL, or using a schematic. Use multiple
formats for the lower-level source files in the design.
If we are working with a synthesized EDIF or NGCINGO file, then skip design entry and
synthesis and start with the implementation process.

3.3.2 Synthesis
After design entry and optional simulation, run synthesis. During this step, VHDL, Verilog, or
mixed language designs become netlist files that are accepted as input to the implementation step.

3.3.3 Implementation
After synthesis, run design implementation, which converts the logical design into a physical file
format that can be downloaded to the selected target device. From Project Navigator, run the
implementation process in one step, or run each of the implementation processes separately.
Implementation processes V(fly depending on whether we are targeting a Field Programmable Gate
Array (FPGA) or a Complex Programmable Logic Device (CPLD).

3.3.4 Verification
It verifies the functionality of the design at several points in the design flow. Then we can use
simulator software to verify the functionality and timing of the design or a portion of design. The
simulator interprets VHDL or Verilog code into circuit functionality and displays logical results of the
described HDL to determine correct circuit operation. Simulation allows creating and verifying complex
functions in a relatively small amount of time and also run in circuit verification after programming
device.
3.3.5 Device Configuration
After generating a programming file, configure the device. During configuration, generate
configuration files and download the programming files from a host computer to a Xilinx device.
IMPACT tool Overview
IMPACT, is a tool featuring batch and graphical user interface (GUI) operations, allows you to
perform the following functions: Device Configuration and File Generation.
The Device Configuration enables you to directly configure Xilinx FPGAs or program Xilinx
CPLDs and PROMs with the Xilinx cables (MutiPRO Desktop Tool, Parallel Cable IV, or Platform
Cable USB) in various modes. In the Boundary-Scan mode, Xilinx FPGAs, CPLDs, and PROMs com be
configured or programmed. In the Slave Serial or Select MAP configuration modes only FPGAs can be
Page 28 of 83

configured directly. In the Desktop Configuration mode Xilinx CPLDs or PROMs can be programmed.
In the Direct SPI Configuration mode select SPJ serial flash (STMicro: M25P, M25PE, M45PE or
Atmel: AT45DB) can be programmed.
File Generation enables you to create the following types of programming files; System ACE CF,
PROM, SVF, STAPL, and XSVF files.
IMPACT also enables us to do the following:
1. Read back and verify design configuration data
2. Debug configuration problems
3. Execute SVF and XSVF files

Fig 3.2: Hardware interconnection


3.3.6 FPGA Design flow
Design
Entry

Design Verification
Behavioral
Simulation

Design
Synthesis

Functional
Simulation
Design
Implementation

Static Timing
Analysis
Back
Annotation

Xilinx Device
Programming

Timing
Simulation
In-circuit
Verification

Fig 3.3: FPGA Design Flow

Page 29 of 83

3.4 Core Generator


The CORE Generator TM is a design tool that delivers parameterized Intellectual Property (IP)
optimized for Xilinx-FPGAs.
The CORE Generator provides ready-made functions which include:
1.

FIFOs and memories

2.

Reed-Solomon Decoder and Encoder

3.

Fir filters

4.

FFTs

5.

Standard bus interfaces such as PCI and PCI-X,

Connectivity and networking interfaces (Ethernet, SPJ-4.2, Rapid IO, CAN and PCI Express).
3.4.1 Memory Interface Generator
This Memory Interface Generator (AIIG) is a simple menu driven tool to generate advanced
memory interfaces. DDR2 SDRAM, DDR SDRAM DDRII SRANM, QDRII SRAM, and RLDRAM II
are supported. This tool generates HDL and pin placement constraints that will help us design our
application
3.4.2 Memory Interface Generator
Interfacing QDRII SRAM with MIG
The Figure below shows a top-level block diagram a/the QDRII memory controller. One side of
the QDRII memory controller connects to the user interface denoted as Block Application. The other
side of the controller interfaces to QDRII memory. The memory interface data width is selectable.

QDR-II Memory
Controller

QDR-II Memory

Block Application

Fig. 3.4: QDR-II Memory Controller

Page 30 of 83

Data is double-pumped to QDRJJ SRAM on both the positive and the negative clock edges. The
HSTL_18 Class I/O standard is used for the data, address, and control signals. QDR-II SRAM interfaces
are source-synchronous and double data rate like DDR SDRAM interfaces. The key advantage to QDRII devices is they have separate data buses for reads and writes to SRAM. These rams are faster and
more protected from error and faults.
Interface model
The memory interface is layered to simplify the design and make the design modular-The Figure
below shows the layered memory interface in the QDRII memory controller-The three layers are the
application layer, the implementation layer, and the physical layer
The application layer comprises the user interface, which initiates memory
writes and reads by writing data and memory addresses to the User Interface
FIFOs. The implementation layer comprises the infrastructure, datapath, and
control logic.
1. The infrastructure logic consists of the DCM and reset logic generation circuitry.
2. The datapath logic consists of the calibration logic by which the data from the
memory component is captured using the FPGA clock.
3. The control logic determines the type of data transfer that is, read/write with
the memory component, depending on the User Interface FIFOs status signals.

User Interface

Implementation Layer
Infrastructure

Data path

Control

Physical Layer
Fig. 3.5: Interface layering model
The physical layer comprises the I/O elements of the FPGA. The controller
communicates with the memory component using this layer. The I/O elements
(such as IDDRs, ODDRs, and IDELAY elements) are associated with this layer.
Hierarchy
Page 31 of 83

The above figure shows the hierarchical structure of the QDRII SRAM design generated by MIG
with a test bench and a DCM. The modules are classified as follows:
Design modules
1. Test bench modules
2. Clocks and reset generation modules parameters selected from MIG.

MIG can generate QDRII SRAM designs in four different ways:


1.

With a test bench and a DCM

2. Without a test bench and with a DCM


3. With a test bench and without a DCM
4. Without a test bench and without a DCM

Design clocks and resets are generated in the infrastructure_top module. When the use DCM
option is checked in MIG, a DCM primitive and the necessary clock buffers are instantiated in the
infrastructure_top module. The inputs to this module are the differential design clock and a 200 MHz
differential clock required for the IDELAYCTRL module. A user reset is also input to this module.
Using the input clocks and reset signals, the system clocks and the system resets used in the design are
generated in this module. When the Use DCM option is unchecked in MIG, the infrastructure _top
module does not have the DCM and the corresponding clock buffer instantiations; therefore, the system
operates on the user-provided clocks. The system reset is generated in the infrastructure top module
using the DCM_LOCK signal and the ready signal of the IDELAYCTRL element.

Page 32 of 83

Fig. 3.8: QDRII SRAM Controller Hierarchy

3.5 Chip Scope Pro


After configuring the device, debug the FPGA design using Chip scope Pro software. From the
Project Navigator Processes tab, double-click Analyze Design Using Chip scope to launch the Chip
scope Pro Analyzer. To use this process, purchase the Xilinx@ Chip scope Pro software and must design
with debug and verification in mind, as described in the following sections. Chip Scope Pro comprises
the Chip Scope Pro cores in the CORE Generator, the Chip Scope Pro Core Inserter, and the Chip Scope
Pro Analyzer.
We use Chip Scope Pro to test the interfacing logic on the hardware i.e., Virtex 5 FPGA by
analyzing the user interface signals which include PPD interface and PLB interface. These signals are
captured using the FIFOs implemented in FPGA and sent to the display interface on PC using JTAG.
3.5.1 Chip Scope Pro Design Flow Overview
To use the Chip Scope Pro software to perform in-circuit verification, we should do the
following:
Page 33 of 83

1.

Insert Chip scope Pro cores in the design using the CORE Generator or Core Inserter.

2.

Implement the design in Project Navigator and configure device.

3.

Analyze the design using the Chip Scope Pro Analyzer

3.5.2 Chip scope Pro Core Insertion


It is used to insert Chip scope Pro cores in the design with the Chip scope Pro tools using one of
the following methods:
1. During design entry using the CORE Generator.
Using the CORE Generator software we create the cores and instantiate those in HDL
source file. Use this software to generate all of the cores available in the Chip scope Pro system. The
wizard provided to create NGC net lists with HDL instantiation templates for any of the supported
synthesis tools. Then use the templates to connect the Chip scope Pro cores to the design logic.
2.

After the Synthesize process in Chip scope Pro Core Inserter.


Using the Chip scope Pro Core Inserter to create the ILA, ATC2, and ICON cores and insert

them in a post-synthesis netlist.


Projects saved in the Core Inserter hold all relevant information about source files, destination
files, core parameters, and core settings. This allows you to store and retrieve information about core
insertion between sessions. The project file (.cdc extension) can also be used as an input to the Analyzer
to import signal names.

Fig. 3.9: Core Inserter as Launched from Project Navigator


Page 34 of 83

2.5.3

Chip scope Pro Cores


Chip scope Pro allows embedding the following cores within design, which assist with

on-chip debugging: integrated logic analyzer (ILA), integrated bus analyzer (IBA), and virtual
input/output (VIO) low-profile software cores. These cores allow viewing internal signals and
nodes in FPGA, including the IBM Core Connect" processor local bus (PLB) that supports the
IBM PowerPC TM 405. Following are the Chip scope Pro cores and their functions:
1. ICON
The Integrated Controller (ICON) core provides the communication between the
embedded ILA, IBA, and VIO cores and the computer running the Chip scope Pro
Analyzer software.
2.

ILA
The ILA core is a customizable logic analyzer core that can be used to monitor the

internal signals in design. Because the ILA core is synchronous to the design being
monitored, all design clock constraints applied to design are also applied to the
components inside the ILA core.
3. ATC2
The Agilent Trace Core 2 (ATC2) is a customizable logic analyzer core. This is similar to
the ILA core but does not use on-chip Block RAM resources to store captured trace data. The
ATC2 core synchronizes Chip scope Pro to the Agilent FPGA dynamic probe technology,
delivering the first integrated application for FPGA debug with logic analyzers.
4. VIO
The virtual input/output core is a customizable core that can both monitor and
drive internal FPGA signals in real lime. Unlike the ILA and IBA cores, the VIO core
does not require on chip RAM.

Page 35 of 83

Fig. 3.10: Chip scope Pro Cores


3.5.4 Chip scope Pro Analyzer
The Chip scope Pro Analyzer tool interfaces directly to the Chip scope Pro cores. Using this
software to download designs, set Trigger conditions, and display data. The waveforms, lists, or graphs,
can be shown and values can be tokenized.
3.6 ModelSim
ModelSim provides a comprehensive simulation and debug environment for complex ASIC and
FPGA designs. Support is provided for multiple languages including Verilog, System Verilog, VHDL
and SystemtC. It also provides an integrated flow with the Model Technology ModelSim simulator
which enables simulation to run from the Xilinx Project Navigator graphical user interface.
Pathnames pane values pane

waveform pane

Page 36 of 83

Cursor name pane

cursor value pane

cursor pane

Fig. 3.11: Panes of Wave Window

Page 37 of 83

CHAPTER 4
HARDWARE BOARD DESCRIPTION
4.1 Board Overview
The hardware on which we are working is a subsystem on single board which is used in the
processing of signals intercepted. II consists of Xilinx" FPGAs, Optical transceivers, cPCI Interface,
Memories (DDR SDRAM and QDRII SRAM) and Ethernet Interface.
4.1.1 Requirement of this Board
Before designing this board as many as 16 independent boards were used for the purpose. But
due to the advances in VLSI technology, all these are now integrated onto a single board. This is highly
advantageous as the board thus developed is smaller in size and the speed of operation is faster.
IV.1.2 Board Block Diagram

36MB
QDR-II
SRAM

PPD Main
Address
Control &
Data

Virtex-4
XC4LX100

Virtex-II Pro
XC2VP7
De interleaved
PDW
cPCI Bridge

cPCI Backplane

Virtex-II
Pro
XC2VP7

Fig. 4.1: Board Block Diagram


Page 38 of 83

128MB
DDR
SDRAM
8MB
Flash
Memory
10/100
Ethernet
PHY

4.2 Signal Interception


4.2.1 Block Diagram

PPD Rear
IO

Virtex-II Pro
XC2VP7

Module

Fig. 4.2: Signal Reception Block diagram


4.2.2 Signal Reception
An ESM system comprises of a receiver which intercepts pulses from various sources of
emission in the environment and determines the pulse parameters which mainly include Frequency,
Pulse Width, Direction of Arrival (DOA) and Amplitude, Using these parameters, it builds the Pulse
Descriptor (PD) Word which is a digitized form of the pulse information. The PD Words are transmitted
by the receiver over optical fiber to the ESM Processor, The function of the ESM Processor is to receive
the interleaved PD Words, de-interleave them, build the emitter file and send the information to display.
This de-interleaving is done at two levels, which are operated independently.
The PD Words received over optical fiber in serial form is converted to parallel form by the Multi
Gigabit Transceiver (MGT) core of the Virtex-II Pro FPGA. The speed of operation of this MGT core is
3.125 Gbits/sec. This Virtex-II Pro FPGA then sends parallel data to Virtex-4 FPGA for 1st level deinterleaving.

4.3 1st level De-interleaving


In first level de-interleaving, the PD Words are de-interleaved by Virtex-4 Pro FPGA based on
Intra pulse parameters-Frequency, DOA, and Pulse Width and stored in memory. Here, Virtex-4 Pro used
is a logic extensive FPGA. The de-interleaved PD Words are stored in the dual port memory (SRAM)
using the concept of Content Addressable Memory (CAM). This memory has both write and read
independent ports so we can do both read and/or write operation at a time.
Here we are using quad data rate SRAM which does 4 operations in one clock cycle i.e., two
write and two read operation here one of the operations is done at the rising edge of the clock pulse and
other one at the falling edge of the clock pulse.
Page 39 of 83

The simple block diagram of 1st level of de-interleaving is shown below


36MB
QDR-II
SRAM

Virtex-4
XC4LX100

PDW
Virtex-II Pro
XC2VP7

Fig 4.3: 1st level de-interleaving

4.4 Memory details


4.4.1 Content Addressable Memory
Content-addressable memory (CAM) is a special type of computer memory used in certain very
high speed searching applications, Unlike standard computer memory (RAM) in which the user supplies
a memory address and the RAM returns the data word stored at that address, a CAM is designed such
that the user supplies a data word and the CAM searches its entire memory to see if that data word is
stored anywhere in it. If the data word is found, the CAM returns a list of one or more storage addresses
where the word was found. In case the word is not found, then it stores it in a new location and returns
the location address. Thus, a CAM is the hardware embodiment of what in software terms would be
called an associative array.
4.4.2 Why is memory required?
Memory is used to solve the speed synchronization problems caused due to a mismatch in the
access rates of the lst and 2nd levels of de-interleaving. Ist level deinterleaving uses. Hardware which
processes and stores one PD Word at a time, hence it is faster. A 2nd level de-interleaving use Software
which reads a group of de-interleaved PD Words at a time and processes them. Hence it is slower.
.

So, a memory is required to store the de-interleaved PD words outputted by the 1 st stage. For the
purpose of storage of PD Words de-interleaved in Ist level, dual port memory is used, which is required
to be independently accessed by the two processes. This type of a memory improves performance by
Page 40 of 83

reducing the memory access conflicts between the two levels, and thus Increases the speed of operation.
Due to high speed memory access requirements, Quad Data Rate SRAMs are used which have
independent read and write ports. This SRAM ideally suits the requirement as there are independent
ports for writing and reading.

4.5 2nd level De-interleaving


In second level de-interleaving, the de-interleaved PD Words are read from memory and
processed to extract the emitter parameters which mainly include Frequency, Pulse Width, DoA,
Amplitude, and Pulse Repetition Frequency (PRF). With the help of these parameters, the emitter file is
built and sent to display via Ethernet.
36MB
QDR-II
SRAM

Virtex-4
XC4LX100

Virtex-II Pro
XC2VP7

Fig. 4.4: 2nd level de-interleaving

4.6 Scope of the Project


Our Project involves implementation of an algorithm in VHDL, to control SRAM
memory access using Virtex-4 FPGA. So, we here develop a logic for interfacing Virtex4 Pro FPGA with QDR-II SRAM For designing and simulation testing the logic, we will
be using XilinxTM ISE vI0.l.

Interleaved
PD Data

Viterx-4 Pro FPGA


PPD

SRAM
Interfacing Logic
in VHDL

Fig. 4.5: SRAM interfacing logic


Page 41 of 83

SRAM

The code is implemented in two phases,


1. Write Cycle: Interfacing QDRII with PPD.
2. Read Cycle: Interfacing QDRII with Emitter Processor Software

Virtex-4 Pro

PPD

S/W
Interface

QDR-II
Memory
Controller

Emitter
Processor
PLB i/f

V2 PRO

Fig. 4.6: Software interface developed in VHDL

Page 42 of 83

QDR-II
Memory

CHAPTER 5
VHDL INTERFACE DESIGN AND STATE DIAGRAM
5.1

Write-Read State Machines

S/W
Interface

PPD

QDR-II
Memory
Controller

QDR-II
Memory

Emitter
Processor
PLB i/f

V2 PRO

Fig. 5.1: Read-Write Interfaces developed


The above diagram tells about the read write interface developed. The VHDL language is written
in the s/w interface. PPD is mainly used for writing into the QDR-II memory by the QDR-II memory
controller, and PLB is for reading the data inside the QDR-II memory.
VHDL Language s/w interface
PPD

Writing

PLB

Reading

The PowerPC 405 core accesses high speed and high performance system resources through
Processor Local Bus (PLB) interfaces on the instruction and data cache controllers. The PLB interfaces
provide separate 32-bit address and 64-bit data buses for the instruction and data sides.
The PLB supports read and write data transfers between master and slave devices equipped with
a PLB bus interface and connected through PLB signals. Bus architecture supports multiple master and
slave devices. Each PLB master is attached to the PLB through separate address, read-data, and writedata buses. PLB slaves are attached to the PLB through shared, but decoupled, address, read-data, and
write-data buses and a plurality of transfer control and status signals for each data bus.
Page 43 of 83

5.2 State Diagrams


INIT_WR

Invalid State

del_cal=1

LT_PDW_2_3

IDLE_WR

hw_fifo_empty=0
and
user_wr_full=0

LT_PDW_0_1

WRFIFO_RD
Fig. 5.2: Write cycle state diagram

reset = 1

INIT_RD

dly_calc=1

IDLE_RD

proc_rd=0

proc_rd=1
ACK_W0_GE
N

proc_rd=1

proc_rd=1

proc_addr(1 down to 0)=11


LT_W3

LATCH_RD
_ADDR

proc_addr(1 downto 0_=01


LT_W1

Page 44 of 83

proc_addr(1 downto)=10

proc_rd=0

proc_rd=0

proc_rd=0

user_rd_full=0

LT_W2
LATCH_EPW_
2_3

test_w_n =1

ACK_W3_GE
N

ACK_W1_GE
N

RD_ADDR
_Wr

ACK_W2_GE
N
LATCH_EP
W_0_1

user_qr_empty=0

WAIT_QR
_EMPTY

Fig. 5.3: Read Cycle state diagram

CHAPTER 6
TEST RESULTS, CONCLUSION AND FUTURE SCOPE OF
WORK
6.1 Simulation Results in ModelSim
6.1.1 Write Cycle
The PPD logic and the processor PLB interface operate with external clock as reference, whereas
the QDRII SRAM Memory Controller operates at 166MHz which is the operating frequency of QDRII
SRAM device. The reset signal used is synchronous with respect to QDRII SRAM reference clock.
A signal with name, dly_cal_done is an indicator signal which will indicate when the QDRII
SRAM device calibration is completed and is ready for access.
The logic uses a FIFO interface to store the processed PD Word which are written with a
minimum time of 200ns, which are to be written into QDRII SRAM device. We simulated this
Page 45 of 83

requirement by generating a signal wr_pulse every 200ns. We employed a counter generate the 128 bit
PD Word to be written the hardware address and hardware data into 5 FIFOs, 1for address and 4 for
data(32 bit each).
The QDRII SRAM operates at a clock rate of 166 Mhz. So, we take a user_clk equals to
166MHz. The write state machine remains idle till dly_cal_done = 1 condition has occurred. Once the
data is written into the FIFOs, the hw_fifo_empty signal goes low signifying that there is a data present
in the PPD FIFO interface. As it goes low, at the next rising edge of the user clock, the state machine
mves into the wrfifo_rd state and hardware read, hw_rd becomes high. The data and address are now
read from the FIFOSs into qdr_wrdata and hw_addr_out respectively. qdr_wrdata which is a data
output of FIFOs is a 128 bit data line. The state machine next moves into lt_pdw_0_1 and subsequently
into lt_pdw_2_3 states. user_w_n_i is an active low signal to latch the 128bit PD Word. lt_pdw is a 2
bit vector which is 01 for lower 64 bit data and 10 for higher 64 bit data. user_dwl and user_dwh
are two 32 bit data lines. Of the 64 bit data, the lower 32 bits are latched to user_dwl and higher 32 bits
are latched to user_dwh. test_w_n_iis the active low signal used to inhabit generation of user_r_n_i
active low signal for read operation at the same time of user_w_n_i signal generation.

Page 46 of 83

Fig 6.1: Write Cycle Simulation Results (1)


Page 47 of 83

Fig. 6.2: Write Cycle Simulation Results (2)


Page 48 of 83

6.1.2 READ CYCLE

The read cycle is initiated by the Processor Local Bus (PLB). This bus is a 32-bit data bus. A
Read signal is generated every time a read operation is initiated by embedded PowerPc processor of
Virtex-II FPGA. These read request are simulated using VHDL and implemented in vertex-4 FPGA.
The read requests are generated every 2 microseconds. The proc_rd signal goes high along with
address proc_addr, including the PLB for reading data from the QDR-II SRAM. Once the user
interface receives the address from PLB, it starts reading the data from the specified location onto its bus
(user interface). Once the data is present in user interface bus then it is latched onto the PLB data once
the fifo_empty signal goes low. Then an acknowledgement signal is generated by the SRAM
suggesting that the data has been latched onto the PLB bus. Since the PLB bus is 32 bit data bus, unlike
in write cycle, only one word at a time is latched onto the PLB bus. As there are four PD Words (PDW)
to be read, it takes 4 read cycles to read them. When the user_qr_empty signal low the first two words
(W0 and W1) are ready present on user_qrl and user_qrh respectively. This condition is known as
first word fall through. So, the word W0 and W1 are latched onto the 128 bit qdr_rddata bus when
user_qr_empty signal goes low. In the next clock cycle, W2 and W3 are latched onto qdr_rddata bus,
and W0 is latched on the PLB data bus. PD Words W1, W2 and W3 are then latched in the next read
request cycles onto the PLB data bus, from qdr_rddata bus. An acknowledgement signal rd_ack is
generated by read state machine every time the data is latched onto the PLB bus.

Page 49 of 83

Fig. 6.3: Read Cycle Simulation Results (1)


Page 50 of 83

Fig. 6.4: Read Cycle Simulation Results (2)


Page 51 of 83

6.2 Hardware Verification using Chip Scope Pro


We use Chip scope Pro to test the interfacing logic on the hardware i.e., Virtex 5 FPGA by
analyzing the user interface signals which include PPD interface and PLB interface. These signals are
captured using the FIFOs implemented in FPGA and sent to the display interface on PC using JTAG.
PPD interface include the signal hw_data, hw_data and hw_addr. PLB interface include the signal
proc_data, proc_data, proc_addr, rd_ack. The debugging of this signal is done using Chip scope Pro
inserter by creating a definition and connection file (.cdc) to synthesized VHDL code.
The in-circuit verification of this signal is done using Chip scope Pro Analyzer (Refer fig.6.7). The main
windaw area can display multiple child windows ( such as trigger, waveform, listing, plot windows) at
the same time. Each window can be maximized, minimized resized and moved as needed. The signals
attached to Chip scope Pro Inserter Core as shown in the signal browser. The trigger setup window is
used to specify the condition for triggering and storing the data. The waveform window displays all the
signal which are sampled with respect to system clock of FPGA. The window is useful to analyze the
timings of the interface signals. Another window is the listing window (Refer fig.6.8) which display
interface buses which are stored using the storage qualification in the trigger setup.
6.2.1 Write Interface
To capture the data in the listing window for the PPD interface, we have added the signals hw_data1,
hw_data2, hw_data3, hw_data4. The storage condition used is the falling edge of hw_wr. The trigger
condition is an initial value of hw_data which is common for bath PPD and PLB interface. Using these
condition we captured the data which and exported it into an excel sheet for future reference.
6.2.2 Read Interface
To capture the data in the listing window for the PLB interface, we have added the signals proc_data,
proc_addr. The storage condition used is the falling edge of rd_ack. The trigger is an initial value of
hw_data which is common for both PPD and PLB interface. Using these condition we captured the data
and exported it into an excel for future reference.
From the two listing, we infer that the data written into the QDR-II memory and the data read from the
QDR-II memory match with one another. Hence, the interface designed by us fulfills the requirement of
the project.

Page 52 of 83

Fig: 6.5: Chip scope pro verification waveform


Page 53 of 83

Fig: 6.6: Chip scope Pro Verification listing of Write cycle


Page 54 of 83

Fig. 6.7: Chip scope Pro Verification Listing of Read cycle


Page 55 of 83

6.3 Conclusion
The VHDL code is written for the interface to control the SRAM memory access using Virtex-4
FPGA. The same has been verified using Modelsim simulation graphs and chip scope pro hardware
simulation in XilinxTM ISE 10.1. The result have been studied and verified. This interfacing logic enables
us to access SRAM with the highest possible speed which supports writing of continuous data input
stream at a rate of 640mbps. This interface logic can be utilized for interfacing QDR memory devices of
upcoming generation with improved technology. The interface enables us to attach to the PLB interface
of embedded PowerPC processor of Virtex family FPGAs with ease.

6.4 Future Scope of Work


The future scope of work for this project includes development of read and write interface
between the QDR-II Memory controller and QDR-II memory. Future projects involve implementation of
ESM processor which is an integration of PPD logic and EP software on a single chip. This helps in the
system on-chip implementation of ESM processor subsystem using a single FPGA (Virtex5 and above).

Page 56 of 83

APPENDIX-A: Program Code


1.1 Software interface code in VHDL
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use work.QDR2_SRAM_parameters_0.all;
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
library UNISIM;
use UNISIM.VComponents.all;
entity qdr_dpif is
port(
user_clk0
user_reset
dly_cal_done

: in std_logic;
: in std_logic;
: in std_logic;

--PPD IF-------------------------clk_100
: in std_logic;
hw_wr
: in std_logic;
hw_data
: in std_logic_vector( 127 downto 0 );
hw_addr
: in std_logic_vector( 20 downto 0 );
al_full
: out std_logic;
proc_rd
proc_addr
proc_data
rd_ack

: in std_logic;
: in std_logic_vector( 22 downto 0 );
: out std_logic_vector( 31 downto 0 );
: out std_logic;

user_w_n
: out std_logic;
user_r_n
: out std_logic;
user_ad_wr : outstd_logic_vector((ADDR_WIDTH_4D-1) downto 0);
user_bwl_n : out std_logic_vector((BW_WIDTH-1) downto 0);
user_bwh_n : out std_logic_vector((BW_WIDTH-1) downto 0);
user_dwl
: out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_dwh
: out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_ad_rd
: out std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
user_qen_n : out std_logic;
compare_error : out std_logic;
user_wr_full : in std_logic;
user_rd_full : in std_logic;
user_qrl
: in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_qrh
: in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_qr_empty
: in std_logic
);
end qdr_dpif;
architecture Behavioral of qdr_dpif is
Page 57 of 83

component synchro
port(
reset
: in std_logic;
clock
: in std_logic;
sig_in
: in std_logic;
sig_out : out std_logic
);
end component;
signal reset_r

: std_logic;

constant unused : std_logic_vector(BW_WIDTH-1 downto 0) := (others => '0');


-- PPD HWDATA & HWADDR FIFO SIGNALS
signal qdr_wrdata
: std_logic_vector( 127 downto 0 );
signal data_al_full
: std_logic_vector(3 downto 0);
signal addr_al_full
: std_logic;
signal hw_data_empty
: std_logic_vector(3 downto 0);
signal hw_addr_empty
: std_logic;
signal hw_fifo_empty : std_logic;
signal hw_addr_in
signal hw_addr_out

: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );

TYPE write_state_type is(


INIT_WR,
IDLE_WR,
WRFIFO_RD,
LT_PDW_0_1,
LT_PDW_2_3
);
signal write_cs
signal write_ns

: write_state_type;
: write_state_type;

signal hw_rd
signal test_w_n_i
signal user_w_n_i
signal lt_hwdata

: std_logic;
: std_logic;
: std_logic;
: std_logic_vector(1 downto 0);

TYPE read_state_type is(


INIT_RD,
IDLE_RD,
LATCH_RD_ADDR,
RDADDR_WR,
WAIT_Q_EMPTY,
LT_EPW_0_1,
LT_EPW_2_3_W0,
ACK_W0_GEN,
LT_W1,
ACK_W1_GEN,
LT_W2,
Page 58 of 83

ACK_W2_GEN,
LT_W3,
ACK_W3_GEN
);
signal read_cs
signal read_ns

: read_state_type;
: read_state_type;

signal lt_rd_ad
: std_logic;
signal user_r_n_i
: std_logic;
signal user_qen_n_i : std_logic;
signal lt_q_0_1
signal lt_q_2_3
signal lt_word

: std_logic;
: std_logic;
: std_logic_vector( 3 downto 0 );

signal proc_rd_sync
signal proc_addr_sync

: std_logic;
: std_logic_vector( 22 downto 0 );

signal proc_data_i
signal rd_ack_i

: std_logic_vector( 31 downto 0 );
: std_logic;

signal qdr_rddata

: std_logic_vector(127 downto 0 );

signal byte_enb
: std_logic_vector(7 downto 0);
signal user_ad_rd_i
: std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
signal user_ad_wr_i
: std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
begin
compare_error <= '0';
user_w_n <= user_w_n_i;
user_r_n <= user_r_n_i;
user_qen_n <= user_qen_n_i;
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
proc_data
<=
proc_data_i;
rd_ack
<=
rd_ack_i;
end if;
end process;
byte_enb <= "00000000";
user_bwl_n <= byte_enb((BW_WIDTH-1) downto 0);
user_bwh_n <= byte_enb((BW_WIDTH-1) downto 0);
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
reset_r <= user_reset;
Page 59 of 83

end if;
end process;
--------------WR_SM---------------------------------------------------------------------------------------process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
write_cs <= INIT_WR;
else
write_cs <= write_ns;
end if;
end if;
end process;
process (write_cs, dly_cal_done, user_wr_full, hw_fifo_empty )
begin
write_ns <= write_cs;
case write_cs is
when INIT_WR =>
if(dly_cal_done = '1') then
write_ns <= IDLE_WR;
end if;
when IDLE_WR =>
if( user_wr_full = '0' and hw_fifo_empty = '0' ) then
write_ns <= WRFIFO_RD;
end if;
when WRFIFO_RD =>
write_ns <= LT_PDW_0_1;
when LT_PDW_0_1 =>
write_ns <= LT_PDW_2_3;
when LT_PDW_2_3 =>
write_ns <= IDLE_WR;
when others =>
write_ns <= INIT_WR;
end case;
end process;
with write_cs select
hw_rd
<= '1' when WRFIFO_RD,
'0' when others;
with write_cs select
test_w_n_i
<= '0' when LT_PDW_0_1,
'1' when others;
Page 60 of 83

with write_cs select


user_w_n_i <= '0' when LT_PDW_2_3,
'1' when others;
with write_cs select
lt_hwdata
<= "01"
when LT_PDW_0_1,
"10" when LT_PDW_2_3,
"00" when others;
process(user_clk0)
begin
if(user_clk0' event and user_clk0 = '1') then
if(reset_r = '1') then
user_dwl <= (others => '0');
user_dwh <= (others => '0');
else
case lt_hwdata is
when "01"
=>
user_dwl <= X"0" & qdr_wrdata( 31 downto 0 );
user_dwh <= X"0" & qdr_wrdata( 63 downto 32 );
when "10"
=>
user_dwl <= X"0" & qdr_wrdata( 95 downto 64 );
user_dwh <= X"0" & qdr_wrdata( 127 downto 96 );
when others =>
null;
end case;
end if;
end if;
end process;
------------------------------------------------------------------------------------------------------------------------RD_SM---------------------------------------------------------------------------------------PROC_RD_SYNC_INST:
synchro port map(
reset
=> reset_r,
clock
=> user_clk0,
sig_in
=> proc_rd,
sig_out => proc_rd_sync
);
PROC_ADDR_SYNC_GEN: for i in 22 downto 0 generate
PROC_ADDR_SYNC_INST:
synchro port map(
reset
=> reset_r,
clock
=> user_clk0,
sig_in
=> proc_addr(i),
sig_out => proc_addr_sync(i)
);
end generate PROC_ADDR_SYNC_GEN;
process (user_clk0)
begin
Page 61 of 83

if(user_clk0'event and user_clk0 = '1') then


if(reset_r = '1') then
read_cs <= INIT_RD;
else
read_cs <= read_ns;
end if;
end if;
end process;
process ( read_cs, dly_cal_done, proc_rd_sync, proc_addr_sync, test_w_n_i, user_rd_full,
user_qr_empty )
begin
read_ns <= read_cs;
case read_cs is
when INIT_RD =>
if(dly_cal_done = '1') then
read_ns <= IDLE_RD;
end if;
when IDLE_RD =>
if proc_rd_sync = '1' then
case proc_addr_sync(1 downto 0) is
when "00"
=>
read_ns <= LATCH_RD_ADDR;
when "01"
=>
read_ns <= LT_W1;
when "10"
=>
read_ns <= LT_W2;
when "11"
=>
read_ns <= LT_W3;
when others =>
null;
end case;
end if;
when LATCH_RD_ADDR =>
if test_w_n_i = '1' and user_rd_full = '0' then
read_ns <= RDADDR_WR;
end if;
when RDADDR_WR =>
read_ns <= WAIT_Q_EMPTY;
when WAIT_Q_EMPTY =>
if(user_qr_empty = '0') then
read_ns <= LT_EPW_0_1;
end if;
when LT_EPW_0_1 =>
read_ns <= LT_EPW_2_3_W0;
when LT_EPW_2_3_W0 =>
read_ns <= ACK_W0_GEN;
when ACK_W0_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
Page 62 of 83

end if;
when LT_W1 =>
read_ns <= ACK_W1_GEN;
when ACK_W1_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
end if;
when LT_W2 =>
read_ns <= ACK_W2_GEN;
when ACK_W2_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
end if;
when LT_W3 =>
read_ns <= ACK_W3_GEN;
when ACK_W3_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
end if;
when others =>
read_ns <= INIT_RD;
end case;
end process;
with read_cs select
lt_rd_ad
with read_cs select
user_r_n_i

<= '1' when LATCH_RD_ADDR,


'0'
when others;
<= '0' when RDADDR_WR,
'1'
when others;

with read_cs select


user_qen_n_i <= '0' when LT_EPW_0_1 | LT_EPW_2_3_W0,
'1'
when others;
with read_cs select
lt_q_0_1
with read_cs select
lt_q_2_3
with read_cs select
lt_word

<= '1' when LT_EPW_0_1,


'0'
when others;
<= '1' when LT_EPW_2_3_W0,
'0'
when others;
<= "0001"

when LT_EPW_2_3_W0,
Page 63 of 83

"0010"
"0100"
"1000"
"0000"

when LT_W1,
when LT_W2,
when LT_W3,
when others;

with read_cs select


rd_ack_i
<= '1' when ACK_W0_GEN | ACK_W1_GEN | ACK_W2_GEN | ACK_W3_GEN,
'0'
when others;
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
qdr_rddata <= (others => '0');
else
if lt_q_0_1 = '1' then
qdr_rddata( 63 downto 0)
<= user_qrh(31 downto 0) &
user_qrl(31 downto 0);
end if;
if lt_q_2_3 = '1' then
qdr_rddata(127 downto 64) <= user_qrh(31 downto 0) & user_qrl(31 downto 0);
end if;
end if;
end if;
end process;
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
proc_data_i <= (others => '0');
else
case lt_word is
when "0001" =>
proc_data_i <= qdr_rddata( 31 downto 0);
when "0010" =>
proc_data_i <= qdr_rddata( 63 downto 32);
when "0100" =>
proc_data_i <= qdr_rddata( 95 downto 64);
when "1000" =>
proc_data_i <= qdr_rddata(127 downto 96);
when others =>
null;
end case;
end if;
end if;
end process;
------------------------------------------------------------------------------------------------------------------------ADDR_GEN0-----------------------------------------------------------------------------------user_ad_rd
user_ad_wr

<= user_ad_rd_i;
<= user_ad_wr_i;
Page 64 of 83

process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
user_ad_wr_i <= (others => '0');
elsif( test_w_n_i = '0' ) then
user_ad_wr_i <= hw_addr_out((ADDR_WIDTH_4D-1) downto 0);
end if;
end if;
end process;
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
user_ad_rd_i <= (others => '0');
elsif lt_rd_ad = '1' then
user_ad_rd_i <= proc_addr_sync((ADDR_WIDTH_4D+1) downto2);
end if;
end if;
end process;
------------------------------------------------------------------------------------------------------------------------PPD_FIFO_IF---------------------------------------------------------------------------------PPD_DATA : for I in 3 downto 0 generate
begin
DATA_FIFO : FIFO16
generic map
(
FIRST_WORD_FALL_THROUGH => false,
ALMOST_FULL_OFFSET => X"00F",
DATA_WIDTH
=> 36
)
port map (
DI
=> hw_data(I*32+31 downto I*32),
DIP
=> byte_enb(3 downto 0),
RDCLK
=> user_clk0,
RDEN
=> hw_rd,
RST
=> reset_r,
WRCLK
=> clk_100,
WREN
=> hw_wr,
ALMOSTEMPTY => open,
ALMOSTFULL => data_al_full(I),
DO
=> qdr_wrdata(I*32+31 downto I*32),
DOP
=> open,
EMPTY
=> hw_data_empty(I),
FULL
=> open,
RDCOUNT => open,
RDERR
=> open,
WRCOUNT => open,
WRERR
=> open
);
Page 65 of 83

end generate PPD_DATA;


hw_addr_in(20 downto 0)
hw_addr_in(31 downto 21)

<=
<=

hw_addr;
(others => '0');

PPD_ADDR_FIFO : FIFO16
generic map
(
FIRST_WORD_FALL_THROUGH => false,
ALMOST_FULL_OFFSET => X"00F",
DATA_WIDTH
=> 36
)
port map (
DI
=> hw_addr_in,
DIP
=> byte_enb(3 downto 0),
RDCLK
=> user_clk0,
RDEN
=> hw_rd,
RST
=> reset_r,
WRCLK
=> clk_100,
WREN
=> hw_wr,
ALMOSTEMPTY => open,
ALMOSTFULL => addr_al_full,
DO
=> hw_addr_out,
DOP
=> open,
EMPTY
=> hw_addr_empty,
FULL
=> open,
RDCOUNT => open,
RDERR
=> open,
WRCOUNT => open,
WRERR
=> open
);
al_full <=

data_al_full(0) or data_al_full(1) or data_al_full(2) or data_al_full(3) or addr_al_full;

hw_fifo_empty <= hw_data_empty(0) or hw_data_empty(1) or hw_data_empty(2) or


hw_data_empty(3) or hw_addr_empty;
end Behavioral;

1.2 VHDL code to generate input for testing


library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
Page 66 of 83

entity hwdata_sim is
Port (
clk_100
reset
dly_cal_done

: in std_logic;
: in std_logic;
: in std_logic;

--debug
hw_test_led

: out std_logic;

hw_wr
hw_data
hw_addr
b0_full

: out std_logic;
: out std_logic_vector( 127 downto 0 );
: out std_logic_vector( 20 downto 0 );
: in std_logic

);
end hwdata_sim;
architecture Behavioral of hwdata_sim is
signal reset_r
: std_logic;
signal dly_cal_done_r : std_logic;
constant INIT
: std_logic_vector( 5 downto 0 ) := "000001";
constant IDLE
: std_logic_vector( 5 downto 0 ) := "000010";
constant WR_GEN
: std_logic_vector( 5 downto 0 ) := "000100";
constant DUMMY_ST
: std_logic_vector( 5 downto 0 ) := "010000";
constant INC_ADDR
: std_logic_vector( 5 downto 0 ) := "100000";
signal current_state
signal next_state

: std_logic_vector( 5 downto 0 );
: std_logic_vector( 5 downto 0 );

signal counter
signal wr_count
signal wr_pulse

: std_logic_vector( 29 downto 0 );
: std_logic_vector( 7 downto 0 );
: std_logic;

signal hw_data1
signal hw_data2
signal hw_data3
signal hw_data4

: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );

signal hw_data_i
signal hw_addr_i
signal hw_wr_i
--debug
signal counter_dbg
signal hw_wr_dbg
signal del_1
constant zeroes_23
constant zeroes_30

: std_logic_vector(127 downto 0 );
: std_logic_vector( 20 downto 0 );
: std_logic;
: std_logic_vector( 29 downto 0 );
: std_logic;
: std_logic;
: std_logic_vector( 22 downto 0 ) := (others => '0');
: std_logic_vector( 29 downto 0 ) := (others => '0');
Page 67 of 83

constant zeroes_32

: std_logic_vector( 31 downto 0 ) := (others => '0');

begin
hw_data
hw_addr
hw_wr

<=
<=
<=

hw_data_i;
hw_addr_i( 20 downto 0 );
hw_wr_i;

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
reset_r
<= reset;
dly_cal_done_r
<=
dly_cal_done;
end if;
end process;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if reset_r = '1' or dly_cal_done_r = '0' or wr_pulse = '1' then
wr_count
<= (others => '0');
else
wr_count
<=
wr_count + '1';
end if;
end if;
end process;
wr_pulse

<=

'1' when wr_count = X"13" else


'0';

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1') then
current_state <= INIT;
else
current_state <= next_state;
end if;
end if;
end process;
process (current_state, dly_cal_done, b0_full, wr_pulse )
begin
next_state <= current_state;
case current_state is
when INIT =>
if(dly_cal_done = '1') then
next_state <= IDLE;
end if;
when IDLE =>
if wr_pulse = '1' and b0_full = '0' then
Page 68 of 83

next_state <= WR_GEN;


end if;
when WR_GEN =>
next_state <= DUMMY_ST;
when DUMMY_ST =>
next_state <= INC_ADDR;
when INC_ADDR =>
next_state <= IDLE;
when others =>
next_state <= INIT;
end case;
end process;
hw_wr_i

<=

current_state(2);

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1') then
counter
<= (others => '0');
elsif( current_state(5) = '1' ) then
counter
<=
counter + '1';
end if;
end if;
end process;
hw_addr_i
--hw_addr_i

<=
<=

hw_data1
hw_data2
hw_data3
hw_data4
hw_data_i

counter( 20 downto 0 );
counter(2 downto 0) & counter( 21 downto 3 );
<=
<=
<=
<=

<=

counter & "00";


counter & "01";
counter & "10";
counter & "11";

hw_data4 & hw_data3 & hw_data2 & hw_data1;

--debug----------process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
counter_dbg
<=
counter;
hw_wr_dbg <=
hw_wr_i;
end if;
end process;
del_1

<=

'1'

when counter_dbg = zeroes_30


'0';
Page 69 of 83

else

hw_test_led

<=

del_1 or hw_wr_dbg;

end Behavioral;

1.3 VHDL code to receive and view Output


library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity procdata_sim is
Port (
clk_100
reset
dly_cal_done

: in std_logic;
: in std_logic;
: in std_logic;

proc_rd
proc_addr
proc_data

: out std_logic;
: out std_logic_vector( 22 downto 0 );
: in std_logic_vector( 31 downto 0 );

test_done
rd_ack

: out std_logic;
: in std_logic

);
end procdata_sim;
architecture Behavioral of procdata_sim is
component synchro
port(
reset
: in std_logic;
clock
: in std_logic;
sig_in
: in std_logic;
sig_out : out std_logic
);
end component;
signal reset_r
: std_logic;
signal dly_cal_done_r
: std_logic;
signal rd_count
signal rd_pulse
constant INIT
constant WRP_WAIT

: std_logic_vector(12 downto 0);


: std_logic;
: std_logic_vector(7 downto 0) := "00000001";
: std_logic_vector(7 downto 0) := "00000010";
Page 70 of 83

constant IDLE
constant RD_GEN
constant WAIT_RDACK
constant CNT_CHK
constant INC_ADDR
constant RST_BRST

: std_logic_vector(7 downto 0) := "00000100";


: std_logic_vector(7 downto 0) := "00001000";
: std_logic_vector(7 downto 0) := "00010000";
: std_logic_vector(7 downto 0) := "00100000";
: std_logic_vector(7 downto 0) := "01000000";
: std_logic_vector(7 downto 0) := "10000000";

signal current_state
signal next_state

: std_logic_vector(7 downto 0);


: std_logic_vector(7 downto 0);

signal word_count

: std_logic_vector(4 downto 0);

signal counter
signal proc_addr_i
signal proc_rd_i

: std_logic_vector(22 downto 0);


: std_logic_vector(22 downto 0);
: std_logic;

signal proc_data_sync
signal rd_ack_sync

: std_logic_vector(31 downto 0);


: std_logic;

signal proc_data_i

: std_logic_vector( 31 downto 0 );

TYPE ackgen_state_type is(


IDLE_ACK,
WAIT_FOR_ACK,
ACK_GEN
);
signal ackgen_cs
signal ackgen_ns

: ackgen_state_type;
: ackgen_state_type;

signal qdr_mem_ack

: std_logic;

--debug
signal proc_rd_dbg
signal proc_addr_dbg
signal rd_ack_sync_dbg
signal proc_data_dbg

: std_logic;
: std_logic_vector(22 downto 0);
: std_logic;
: std_logic_vector(31 downto 0);

signal del_1
signal del_2

: std_logic;
: std_logic;

constant zeroes_23
constant zeroes_32

: std_logic_vector( 22 downto 0 ) := (others => '0');


: std_logic_vector( 31 downto 0 ) := (others => '0');

begin
proc_addr
proc_rd

<=

proc_addr_i( 22 downto 0 );
<=
proc_rd_i;

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
reset_r

<= reset;
Page 71 of 83

dly_cal_done_r

<=

dly_cal_done;

end if;
end process;
RD_ACK_SYNC_INST:
synchro port map(
reset
=> reset_r,
clock
=> clk_100,
sig_in
=> rd_ack,
sig_out => rd_ack_sync
);
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1' or dly_cal_done_r = '0') then
ackgen_cs <= IDLE_ACK;
else
ackgen_cs <= ackgen_ns;
end if;
end if;
end process;
process (ackgen_cs, proc_rd_i, rd_ack_sync )
begin
ackgen_ns <= ackgen_cs;
case ackgen_cs is
when IDLE_ACK =>
if proc_rd_i = '1' AND rd_ack_sync = '0' then
ackgen_ns <= WAIT_FOR_ACK;
end if;
when WAIT_FOR_ACK =>
if rd_ack_sync = '1' then
ackgen_ns <= ACK_GEN;
end if;
when ACK_GEN =>
if proc_rd_i = '0' then
ackgen_ns <= IDLE_ACK;
end if;
when others =>
ackgen_ns <= IDLE_ACK;
end case;
end process;
with ackgen_cs select
qdr_mem_ack <= '1' when ACK_GEN,
'0' when others;
PROC_DATA_SYNC_GEN: for i in 31 downto 0 generate
Page 72 of 83

PROC_DATA_SYNC_INST:
synchro port map(
reset
=> reset_r,
clock
=> clk_100,
sig_in
=> proc_data(i),
sig_out => proc_data_sync(i)
);
end generate PROC_DATA_SYNC_GEN;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if reset_r = '1' or dly_cal_done_r = '0' or rd_pulse = '1' then
rd_count
<= (others => '0');
else
rd_count
<=
rd_count + '1';
end if;
end if;
end process;
rd_pulse

<=

'1' when rd_count = X"14D" else


'0';

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1') then
current_state <= INIT;
else
current_state <= next_state;
end if;
end if;
end process;
process (current_state, dly_cal_done_r, rd_pulse, qdr_mem_ack, word_count )
begin
next_state <= current_state;
case current_state is
when INIT =>
if(dly_cal_done_r = '1') then
next_state <= IDLE;
end if;
when IDLE =>
if rd_pulse = '1' then
next_state <= RD_GEN;
end if;
when RD_GEN =>
next_state <= WAIT_RDACK;
when WAIT_RDACK =>
Page 73 of 83

if qdr_mem_ack = '1' then


next_state <= CNT_CHK;
end if;
when CNT_CHK =>
if word_count(4) = '1' then
next_state <= RST_BRST;
else
next_state <= INC_ADDR;
end if;
when INC_ADDR =>
next_state <= RD_GEN;
when RST_BRST =>
next_state <= IDLE;
when others =>
next_state <= INIT;
end case;
end process;
proc_rd_i

<=

current_state(3) or current_state(4);

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if( reset_r = '1' or current_state(7) = '1' ) then
word_count <= "00001";
elsif( current_state(6) = '1' ) then
word_count <=
word_count + '1';
end if;
end if;
end process;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1' ) then
counter
<= (others => '0');
elsif( current_state(6) = '1' or current_state(7) = '1' ) then
counter
<=
counter + '1';
end if;
end if;
end process;
proc_addr_i <=
--proc_addr_i <=

counter( 22 downto 0 );
counter(3 downto 2) & counter( 22 downto 4 ) & counter(1 downto 0);

process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if reset_r = '1' then
Page 74 of 83

proc_data_i
elsif proc_rd_i = '1' then
proc_data_i

<= (others => '0');


<=

proc_data_sync;

end if;
end if;
end process;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
proc_rd_dbg
<=
proc_rd_i;
proc_addr_dbg
<=
proc_addr_i;
rd_ack_sync_dbg
<=
qdr_mem_ack;
proc_data_dbg
<=
proc_data_i;
end if;
end process;
del_1

<=

'1'

when proc_addr_dbg = zeroes_23 else


'0';

del_2

<=

'1'

when proc_data_dbg = zeroes_32 else


'0';

test_done

<=

proc_rd_dbg or proc_rd_dbg or del_1 or rd_ack_sync_dbg or del_2;

end Behavioral;

Page 75 of 83

APPENDIX B: NAMES OF DIFFERENT MODULES AND THEIR


PURPOSE IN QDR-IISRAM CONTROLLER
1) Filename: qdr_sram.vhd
Purpose: This is the main module of the controller. It has clock forwarding logic, delay calibration
logic to capture data and synchronize it to the FPGA clock, interface controller logic.
2) Filename: mig_23_idelay_ctrl.vhd
Purpose: This module implements the delay generation for Calibration circuit.
3) Filename: qdr_sram_infrastructure_top.vhd
Purpose:

This module incorporates Clock generation module, and Reset logic.

4) Filename: qdr_sram_main_0.vhd
Purpose: Top level example design incorporating QDRII Memory Controller module, an example
Clock generation module, and Reset logic.
5) Filename: qdr_sram_top_0.vhd
Purpose: Top level module for QDR-II memory controller design. This is the main module that
should be instantiated into a new FPGA design (along with all sub-modules) to implement a QDRII
interface.
6) Filename: qdr_sram_user_interface_0.vhd
Purpose: Responsible for storing the Read/Write requests made by the user design.

Instantiates, the

FIFOs for Read and Write address, data, and control storage
7) Filename: qdr_sramJd_user_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design Instantiates the FIFOs
for Read address, data, and control storage.
8) Filename: qdr_sramJd_addr_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design. Instantiates the FIFOs
for Read address and control storage.
9) Filename: qdr_sramJd_data_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design. Instantiates, the FIFOs
for Read data storage.

Page 76 of 83

10) Filename: qdr_sram_data_fifo_mem_0.vhd


Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the
FIFOs for Write/Read data storage
11) Filename: qdr_sram_wr_user_interface_0.vhd
Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs
for Write address, data, and control storage.
12) Filename: qdr_sram_wr_addr_interface_0.vhd
Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs
for Write address and control storage.
13) Filename: qdr_sram_wr_data_interface_0.vhd
Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs
for write data storage.
14) Filename: qdr_sram_data_fifo_18_0.vhd
Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the
FIFOs for Write/Read data storage.
15) Filename: qdr_sram_data_bw_fifo_0.vhd
Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the
FIFOs for Write/Read data storage.
16) Filename: qdr_sram_qdr_mem_sm_0.vhd
Purpose:

Monitors Read/Write queue status from User Interface FIFOs and generates strobe

signals to launch Read/Write requests to QDR II device.


17) Filename: qdr_sram_iobs_0.vhd
Purpose: This module implements the physical interface for the Write data path generates the write
path (QDR-II) from the WRITE data FIFOs to the OBUFs.
18) Filename: qdr_sram_c1ockjorward_0.vhd
Purpose: This module implements the physical interface for the clock path generates the forwarded
clocks (K and K) for the QDR-II SRAM Memory device. This scheme is used to match the Clock-toOut delays of the data path.
19) Filename: qdr_sram_ctrUobs_0.vhd
Purpose: This module implements the physical interface for the memory control signals.
Page 77 of 83

20) Filename: qdr_sram_address_burst_0.vhd


Purpose: This module is a part of physical interface. It describes the way the FF's and OBUFT's
need to be instantiated in order to present the address to the external memory:
21) Filename: qdr_sram_qdrJd_enable.vhd
Purpose: This module generates QDR_R_n (Read Enable) and QDR_W_n (Write Enable)
for QDR memory.
22) Filename: qdr_sram_bw_burst_0.vhd
Purpose: This module implements the physical interface for the Byte Write enable path Generates
the byte write path (BW_n) from the WRITE address FIFO to the OBUFs.
23) Filename: qdr_sram_data_path_iobs_0.vhd
Purpose: This module implements the physical interface for the Write data, read data path.
24) Filename: qdr_sram_qdr_d_iob_0.vhd
Purpose: This module transfers the data from memory to FIFO'S.
25) Filename: qdr_sram_qdr_cq_iob_0.vhd
Purpose: This module implements the delaying of echo clock CQ.
26) Filename: qdr_sram_qdr_q_iob_0.vhd
Purpose: This captures data from memory.
27) Filename: qdr_sram_data_path_0.vhd
Purpose: This module acts as an interface between the users and IOBs.
28) Filename: qdr_sramJead_ctrl_0.vhd
Purpose: This module generates QDR_R_n (Read Enable for QDR memory) and strobe for READ
FIFO.
29) Filename: qdr_sram_tap_logic_0.vhd
Purpose: This module implements the tap generation for the Read path (QDR_Q).
30) Filename: qdr_sram_dly_cal_sm.vhd
Purpose: Calibrates the IDELAY tap values for the QDR_Q inputs to allow direct capture of the
read data into the system clock domain.
31) Filename: qdr_sram_data_tap_inc.vhd
Purpose: This module implements the tap selection controller for data bits associated with a strobe.
Page 78 of 83

32) Filename: qdr_sram_write_burst_0.vhd


Purpose: This module implements the physical interface for the Write path. Generates the write
path (QDR_D)ji-ol17 the WRITE data FIFOs to the OBUFs.
33) Filename: qdr_sram_test_hench_0.vhd
Purpose: This module implements a hardware test bench that will issue interleaved Read and Write
requests to the QDR II memory device.
34) Filename: qdr_sram_wr_rd_sm_0.vhd
Purpose: This module implements a state machine for issuing Read/Write requests to the QDR II
memory device.
35) Filename: qdr_sram_q_sm_0.vhd
Purpose:

This module implements a state machine for reading back values from read data FIFO'S

and comparing the values generated in test bench and also serves as an error detection module to make
sure that the data returning from the memory is same as the data written to it.
36) Filename: qdr_sram_data_gen_0.vhd
Purpose: This module implements a data generator that generates data for Read and Write requests
to the QDR II memory device
37) Filename: qdr_sram_addr_gen_0.vhd
Purpose: The module is a part of internal test bench It generates addresses for both read and
write.

Page 79 of 83

APPENDIX C: CHIP SCOPE PRO LISTING OF WRITE CYCLE


Sample in Window

hw_addr_23_2

hw_data1

hw_data2

hw_data3 hw_data4

1
2
3
4
5
6
7
8
9
10

000000
000001
000002
000003
000004
000005
000006
000007
000008
000009

00000000
00000004
00000008
0000000C
00000010
00000014
00000018
0000001C
00000020
00000020

00000001
00000005
00000009
0000000D
00000011
00000015
00000019
00000000
00000021
00000025

00000002
00000006
0000000A
0000000E
00000012
00000016
0000001A
0000001E
00000022
00000026

00000003
00000007
0000000B
0000000F
00000013
00000017
0000001B
0000001F
00000023
00000027

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

00000A
00000B
00000C
00000D
00000E
00000F
000010
000011
000012
000013
000014
000015
000016
000017
000018
000019
00001A
00001B
00001C
00001D
00001E
00001F
000020
000021
000022
000023
000024
000025
000026
000027
000028

00000028
0000002C
00000030
00000034
00000038
0000003C
00000040
00000044
00000048
0000004C
00000050
00000054
00000058
0000005C
00000060
00000064
00000068
0000006C
00000070
00000074
00000078
0000007C
00000080
00000084
00000088
0000008C
00000090
00000094
00000098
0000009C
000000A0

00000029
0000002D
00000031
00000035
00000039
0000003D
00000041
00000045
00000049
0000004D
00000051
00000055
00000059
0000005D
00000061
00000065
00000069
0000006D
00000071
00000075
00000079
0000007D
00000081
00000085
00000089
0000008D
00000091
00000095
00000099
0000009D
000000A1

0000002A
0000002E
00000032
00000036
0000003A
0000003E
00000042
00000046
0000004A
0000004E
00000052
00000056
0000005A
0000005E
00000062
00000066
0000006A
0000006E
00000072
00000076
0000007A
0000007E
00000082
00000086
0000008A
0000008E
00000092
00000096
0000009A
0000009E
000000A2

0000002B
0000002F
00000033
00000037
0000003B
0000003F
00000043
00000047
0000004B
0000004F
00000053
00000057
0000005B
0000005F
00000063
00000067
0000006B
0000006F
00000073
00000077
0000007B
0000007F
00000083
00000087
0000008B
0000008F
00000093
00000097
0000009B
0000009F
000000A3

42

000029

000000A4

000000A5

43

00002A

000000A8

000000A9 000000AA 000000AB

Page 80 of 83

000000A6 000000A7

44

00002B

000000AC 000000AD 000000AE 000000AF

45
46

00002C
00002D

000000B0 000000B1 000000B2 000000B3


000000B4 000000B5 000000B6 000000B7

APPENDIX D: CHIP SCOPE PRO LISTING OF READ CYCLE


Sample in Window
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

proc_addr

proc_data

000000
000001
000002
000003
000004
000005
000006
000007
000008
000009
00000A
00000B
00000C
00000D
00000E
00000F
000010
000011
000012
000013
000014
000015
000016
000017
000018
000019
00001A
00001B
00001C
00001D
00001E
00001F
000020
000021
000022
000023
000024
000025
000026
000027
000028

00000000
00000001
00000002
00000003
00000004
00000005
00000006
00000007
00000008
00000009
0000000A
0000000B
0000000C
0000000D
0000000E
0000000F
00000010
00000011
00000012
00000013
00000014
00000015
00000016
00000017
00000018
00000019
0000001A
0000001B
0000001C
0000001D
0000001E
0000001F
00000020
00000021
00000022
00000023
00000024
00000025
00000026
00000027
00000028

Page 81 of 83

42
43
44
45
46

000029
00002A
00002B
00002C
00002D

Page 82 of 83

00000029
0000002A
0000002B
0000002C
0000002D

REFERENCES:
[1] Clive Maxfield The design warrior's guide to FPGAs
[2] Will R. Moore, Wayne Luk Field-programmable Logic and Applications
[3] Marian Adamski, Marek Wegrzyn Design of embedded control systems
[4] Sunggu Lee Advanced Digital Logic Design
[5] Pong P Chu RTL hardware design using VHDL
[6] http://wwwfpga4fun.com
[7] http://www.fpgasummit.com
[8] http://www.fpga.com
[9]http://video.google.comlvideoplay?docid=-5776J46032722J35072
[10]http:/www.xilinx.comlsupport/documentation/virtex-4_userguides.htm
[11]http://www.actel.com/documents/modelsim_tutorial_ug.pdf
[12]http://www.xilinx.com/ise/optionalyrod/cspro.htm
[13]http://japan.xilinx.com/products/ipcenter/DO-CSP-PRO.htm

Page 83 of 83

You might also like