Professional Documents
Culture Documents
INTRODUCTION
1.1
dedicated functions, often with real-time computing constraints. It is usually embedded as part of a
complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such
as a personal computer, can do many different tasks depending upon programming. Embedded systems
control many of the common devices in use today. Since the embedded system is dedicated to specific
tasks, design engineers can optimize it by reducing the size and cost of the product, or increasing the
reliability and performance. An embedded system can also be defined as an engineering artefact
involving computation that is subject to physical constraints arising through interactions of
computational processes with the physical world. These physical constraints are divided into reaction
and execution constraints. Reaction constraints originate from the behavioral requirements and specify
the deadlines, throughput and jitter whereas the execution constraints originate from the implementation
requirements and put bounds on available processor speeds, power, and memory and hardware failure
rates.
Some embedded systems are mass-produced, benefiting from economies of scale. In general,
"embedded system" is not an exactly defined term, as many systems have some element of
programmability. Physically, embedded systems range from portable devices such as digital watches and
MP4 players to large stationary installations like traffic lights, factory controllers, or the systems
controlling nuclear power plants/missiles/satellites. Complexity varies from low, with a single
microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large
chassis or enclosure.
Embedded systems range from no user interface at all, dedicated only to one task, to complex
graphical user interfaces that resemble modern computer desktop operating systems. Simple embedded
devices use buttons, LEDs, and small character-or digit-only displays, often with a simple menu system.
Embedded processors can be broken into two broad categories: ordinary microprocessors (P) and
microcontrollers (C), which have many more peripherals on chip, reducing cost and size.
A common configuration for very-high-volume embedded systems is the system on a chip
(SOC). A system on chip is an integrated circuit which contains a complete system consisting of
multiple processors, multipliers, caches and interfaces on a single chip. SOCs can be implemented as an
application-specific integrated circuit (ASIC) or using a field-programmable gate array (FPGA).
Page 1 of 83
1.1.1
Embedded systems are designed to some specific task, rather than be a general-purpose
computer for multiple tasks. Some have real-time performance constraints that must be
met, for reasons such as safety and usability; others may have low or no performance
requirements, allowing the system hardware to be simplified to reduce costs.
Embedded systems are not always stand-alone devices. Many embedded systems consist
of small, computerized parts within a larger device that serves a more general purpose.
For example, the Gibson Robot Guitar features an embedded system for tuning the
strings; the overall purpose of the guitar is, of course to play music. Similarly, an
embedded system in an automobile provides a specific function as a subsystem of the car
itself.
The program instruction written for embedded systems are referred to as firmware, and
are stored in read-only memory or flash memory chips. They run with limited computer
hardware resources, little memory, small or non-existent keyboard and/or screen.
Embedded systems often reside in machines that are expected to run continuously for
years without errors and in some cases recover by them if an error occurs. Therefore the
software is usually developed and tested more carefully than for personal computers, and
unreliable mechanical moving parts such as hard drives, switches or buttons are avoided.
1.2.1 Description of EW
Page 2 of 83
The term Electronic attack (EA) refers to the usage of electromagnetic energy, directed energy, or
anti-radiation weapons to attack personnel, facilities, or equipment with the intent of degrading,
neutralizing, or destroying enemy combat capability. In case of EM energy, this action is referred to as
jamming and can be performed on communications systems or radar systems.
Electronic protect or Electronic protective measures (EPM) involves actions taken to protect
personnel, facilities and equipment from any effects of friendly or enemy use of electromagnetic
spectrum that degrade, neutralize or destroy friendly combat compatibility.
In military telecommunications, the terms Electronic Support (ES) or Electronic Support Measures
(ESM) describe the division of electronic warfare involving actions taken under direct control of an
operational commander to detect, intercept, identify, locate, record, and/or analyze sources of radiated
electromagnetic energy for the purposes of immediate threat recognition (such as warning that fire
control RADAR has locked on a combat vehicle, ship, or aircraft) or longer-term operational planning.
Thus, Electronic Support provides a source of information required for decisions involving Electronic
Protection (EP), Electronic Attack (EA), avoidance, targeting, and other tactical employment of forces.
Electronic Support data can be used to produce signals intelligence (SIGINT), communications
intelligence (COMINT) and electronics intelligence (ELINT).
Digital communication became important with the expansion of the use of computers and data
processing and had continued to grow as a major industry providing the inter connection of computer
peripherals and transmission of data between distant sites. With the requirement of higher and higher
speeds of data transmission, the stress on the development of digital communication techniques has
increased, Also, the channel and its characteristics bandwidth, frequency, noise, distortion, transmission
speed, type of coding etc. got improved from time to time.
Electronic Support Measures gather intelligence through passive "listening" to electromagnetic
radiations of military interest. Electronic support measures can provide.
1.
2.
3.
Page 3 of 83
1. Wide-spectrum
or
bandwidth
capability
because
foreign
frequencies
are
initially
unknown.
2. Wide dynamic range because signal strength is initially unknown.
3.
Narrow band pass to discriminate the signal of interest from other electromagnetic radiation on
nearby frequencies.
1.3
A Field Programmable Gate Array (FPGA) is a semiconductor device that can be configured by
the customer or designer after manufacturing hence the name ''field-programmable". FPGAs are
programmed using a logic circuit diagram or a source code in a hardware description language (HDL) to
specify how the chip will work. They can be used to implement any logical function that an application
specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping
offers advantages for many applications. FPGAs contain programmable logic components called "logic
blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired togethersomewhat like a one-chip programmable breadboard logic blocks can be configured to perform complex
combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic
blocks also include memory elements, which may be simple flip-flops or more complete blocks of
memory.
The cost of an FPGA design is much lower than that of an ASIC (although the ensuing ASIC
components are much cheaper in large production runs). At the same time, implementing design changes
is much easier in FPGAs, and the time-to-market for such designs is much faster. FPGAs are often used
to prototype ASIC designs or to provide a hardware platform on which to verify the physical
implementation of new algorithms. However, their low development cost and short time-to-market mean
that they are increasingly finding their way into final products (some of the major FPGA vendors
actually have devices that they specifically market as competing directly against ASICs).
FPGA Origin
Around the beginning of the I980s, it became apparent that there was a gap in the digital IC
continuum. At one end, there were programmable devices like SPLDs and CPLDs, which were highly
configurable and had fast design and modification times, but which couldn't support large or complex
functions. At the other end of the spectrum were ASICs. These could support extremely large and
Page 5 of 83
complex functions, but they were painfully expensive and time-consuming to design. Furthermore, once
a design had been implemented as an ASIC it was effectively frozen in silicon.
PLDS
ASICs
SPLDS
The
Gap
CPLDS
Gate Arrays
Structured ASICs*
Full Custom
Standard Cell
Fig. 1.2: The Gap between PLDs and ASICs
The early devices were based on the concept of a programmable logic block, which comprised a
3-input lookup table (LUT), a register that could act as a flip-flop or a latch, and a multiplexer, along
with a few other elements that are of little interest here.
a
b
c
3-input
LUT
Mux
flip-flop
y
q
d
clock
Fig. 1.3: The key elements forming a simple programmable logic block
Each FPGA contained a large number of these programmable logic blocks, as discussed below.
By means of appropriate SRAM programming cells, every logic block in the device could be configured
to perform a different function. Each register could be configured to initialize containing logic 0 or logic
1 and to act as a flip-flop (as shown in Fig: 1.3) or a latch. If the flip-flop option were selected, the
register could be configured to be triggered by a positive-or negative-going clock (the clock signal was
common to all of the logic blocks). The multiplexer feeding the flip-flop could be configured to accept
Page 6 of 83
the output from the LUT or a separate input to the logic block, and the LUT could be configured to
represent any 3-input logical junction.
1.3.2FPGA Architecture
The complete FPGA comprised of a large number of programmable logic block called "islands"
surrounded by a "sea" of programmable interconnects. High-level illustration is merely an abstract
representation. All of the transistors and interconnects would be implemented on the same piece of
silicon using standard IC creation techniques. In addition to the local interconnect reflected in figure,
there would also be global (high-speed) interconnection paths that could transport signals across the chip
without having to go through multiple local switching elements. The device would also include primary
I/O pins and pods. By means of its own SRAM cells, the interconnect could be programmed such that
the primary inputs to the device were connected to the inputs of one or more programmable logic blocks,
and the outputs from any logic block could be used to drive the inputs, the primary outputs from the
device, or both.
1.4
Virtex-I is the newest generation FPGA from Xilinx. Virtex-5 family contains five distinct
platforms, the most choice offered by any FPGA family. Each platform contains a different ratio of
features to address the needs of a wide variety of advanced logic designs. In addition to the most
advanced, high performance logic fabric, Virtex-5 FPGAs contain many hard-IP system level blocks,
including powerful 36-Kbit block RAM/FIFOs, second generation 25*18 DSP slices. Also Virtex-5
offers the best solution for addressing the needs of high performance logic designers, high performance
DSP designers, and high performance embedded systems designers with unprecedented logic, DSP,
hard/soft microprocessor and connectivity capabilities. The Virtex-5 LX, LXT, SXT, TXT and FXT
platforms include high speed serial connectivity and link/transaction layer capability.
The 5 platforms are:
Virtex-5 LX:
Architectural Description
Virtex-5 devices are user-programmable gate arrays with various configurable elements and
embedded cores optimized for high-density and high-performance system designs. Virtex-5
devices implement the following functionality:
I/O blocks provide the interface between package pins and the internal configurable logic.
Most popular and leading-edge I/O standards are supported by programmable I/O blocks
(IOBs). The IOBs can be connected to very flexible Chip Sync logic for enhanced sourcesynchronous interfacing. Source-synchronous optimizations include per-bit deskew (on both
input and output signals), data serializers or deserializers, clock dividers, and dedicated I/O
and local clocking resources.
Configurable Logic Blocks (CLBs), the basic logic elements for Xilinx FPGAs, provide
combinatorial and synchronous logic as well as distributed memory and SRL32 shift register
capability. Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and
provide superior capabilities and performance compared to previous generations of
programmable logic.
Block RAM modules provide flexible 36 Kbit true dual port RAM that are cascadable to
form larger memory blocks. In addition, Virtex-5 FPGA block RAMs contain optional
Page 8 of 83
programmable FIFO logic for increased device utilization. Each block RAM can also be
configured as two independent 18 Kbit true dual-port RAM blocks, providing memory
granularity for designs needing smaller RAM blocks.
Clock Management Tile (CMT) blocks provide the most flexible, highest-performance
clocking for FPGAs. Each CMT contains two Digital Clock Manager (DCM) blocks (selfcalibrating, fully digital), and one PLL block (self-calibrating, analog) for clock distribution
delay compensation, clock multiplication/division, coarse- /fine-grained clock phase shifting,
and input clock jitter filtering.
latches. Each CLB has internal fast interconnect and connects to a switch matrix to access general
routing resources.
Block RAM
The 36 Kbit true dual-port RAM block resources are programmable from 32K x 1 to 512 x 72, in
various depth and width configurations.
In addition, each 36-Kbit block can also be configured to operate as two, independent 18- Kbit
dual-port RAM blocks. Each port is totally synchronous and independent, offering three readduring-write modes.
Block RAM is cascadable to implement large embedded storage blocks. Additionally, back-end
pipeline registers, clock control circuitry, built-in FIFO support, ECC, and byte write enable
features are also provided as options.
Global Clocking
The CMTs and global-clock multiplexer buffers provide a complete solution for designing highspeed clock networks. Each CMT contains two DCMs and one PLL. The DCMs and PLLs can
be used independently or extensively cascaded. Up to six CMT blocks are available, providing
up to eighteen total clock generator elements. Each DCM provides familiar clock generation
capability.
To generate de skewed internal or external clocks, each DCM can be used to eliminate clock
distribution delay. The DCM also provides 90, 180, and 270 phase-shifted versions of the
output clocks. Fine-grained phase shifting offers higher resolution phase adjustment with fraction
of the clock period increments. Flexible frequency synthesis provides a clock output frequency
equal to a fractional or integer multiple of the input clock frequency.
To augment the DCM capability, Virtex-5 FPGA CMTs also contain a PLL. This block provides
reference clock jitter filtering and further frequency synthesis options. Virtex-5 devices have 32
global-clock MUX buffers. The clock tree is designed to be differential. Differential clocking
helps reduce jitter and duty cycle distortion.
DSP48E Slices
DSP48E slice resources contain a 25 x 18 twos complement multiplier and a 48-bit adder/subs
tractor/accumulator. Each DSP48E slice also contains extensive cascade capability to efficiently
implement high-speed DSP algorithms.
Routing Resources
All components in Virtex-5 devices use the same interconnect scheme and the same access to the global
routing matrix. In addition, the CLB-to-CLB routing is designed to offer a complete set of connectivity
in as few hops as possible. Timing models are shared, greatly improving the predictability of the
performance for high speed designs.
Page 10 of 83
Configuration
Virtex-5 devices are configured by loading the bit stream into internal configuration memory using one
of the following modes:
Slave-serial mode
Master-serial mode
Slave Select MAP mode
Master Select MAP mode
Boundary-Scan mode (IEEE-1532 and -1149)
SPI mode (Serial Peripheral Interface standard Flash)
BPI-up/BPI-down modes (Byte-wide Peripheral interface standard x8 or x16 NOR Flash)
System Monitor
FPGAs are an important building block in high availability/reliability infrastructure. Therefore,
there is need to better monitor the on-chip physical environment of the FPGA and its immediate
surroundings within the system.
For the first time, the Virtex-5 family System Monitor facilitates easier monitoring of the FPGA
and its external environment. Every member of the Virtex-5 family contains a System Monitor
block.
The System Monitor is built around a 10-bit 200kSPS ADC (Analog-to-Digital Converter). This
ADC is used to digitize a number of on-chip sensors to provide information about the physical
environment within the FPGA. On-chip sensors include a temperature sensor and power supply
sensors. Access to the external environment is provided via a number of external analog input
channels. These analog inputs are general purpose and can be used to digitize a wide variety of
voltage signal types.
Support for unipolar, bipolar, and true differential input schemes is provided. There is full access
to the on-chip sensors and external channels via the JTAG TAP, allowing the existing JTAG
infrastructure on the PC board to be used for analog test and advanced diagnostics during
development or after deployment in the field.
The System Monitor is fully operational after power up and before configuration of the FPGA.
System Monitor does not require an explicit instantiation in a design to gain access to its basic
functionality. This allows the System Monitor to be used even at a late stage in the design cycle
XC5VFX100T-1FFG1738
Pin count
Xilinx
Lead free
Virtex 5
Embedded
Power
Processor
Flip Chip
Page 11 of 83
Speed
Logical
capacity
CHAPTER 2
QDR-II STATIC RAM
2.1 Introduction to Memories
Computer data storage, often called storage or memory, refers to computer components, devices,
and recording media that retain digital data used for computing for some interval of time. Computer data
storage provides one of the core functions of the modern computer, that of information retention.
Memory is directly accessible to CPU. The CPU continuously reads instructions stored there and
executes them as required. Any data actively operated on is also stored there in uniform manner. This
memory is mainly of two types RAM and ROM.
specific hardware, and unlikely to require frequent updates). ROM is fabricated with the desired data
permanently stored in it, and thus can never be modified. However, more modern types such as EPROM
and flash EEPROM can be erased and re-programmed multiple times; they are still described as "readonly memory" (ROM) because the reprogramming process is generally infrequent, comparatively slow,
and often does not permit random access writes to individual memory locations. There are different
types of ROM Classic mask programmed ROM chips are integrated circuits that physically encode the
data to be stored, and thus it is impossible to change their contents after fabrication
1.
written to or programmed via a special device called a PROM programmer. Typically, this device uses
high voltages to permanently destroy or create internal links (fuses or anti fuse) within the chip.
Consequently, a PROM can only be programmed once.
2.
ultraviolet light (typically for 10 minutes or longer), then rewritten with a process that again requires
application of higher than usual voltage. Repeated exposure to UV light will eventually wear out an
EPROM, but the endurance of most EPROM chips exceeds 1000cycles of erasing and reprogramming.
EPROM chip packages can often be identified by the prominent quartz "window" which allows UV light
to enter. After programming, the window is typically covered with a label to prevent accidental erasure.
Some EPROM chips are factory erased before they are packaged, and include no window: these are
effectively PROM.
3.
semiconductor structure to EPROM but allows its entire contents (or selected banks) to be electrically
erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3
player, etc). Writing or flashing an EEPROM is much slower (milliseconds per bit) than reading from a
ROM or writing to a RAM (nanosecond in both cases).
4.
Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified
one bit at a lime. Writing is a very slow process and again requires higher voltage (usually around 12V)
than is used for read access. EAROMs are intended for applications that require infrequent and only
partial rewriting. EAROM may be used as non-volatile storage for critical system setup information; in
many applications, EAROM has been supplanted by CMOS RAM supplied by mains power and backedup with a lithium battery.
5.
Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory
can be erased and rewritten faster than ordinary EEPROM, and newer designs feature very high
endurance (exceeding 1,000,000 cycles). Modern NAND flash makes efficient use of silicon chip area,
resulting in individual ICs with a capacity as high as 16 GB as of 2007; this feature, along with its
endurance and physical durability, has allowed NAND flash to replace magnetic in some applications
Page 13 of 83
(such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used
as a replacement for older ROM types, but not in applications that take advantage of its ability to be
modified quickly and frequently.
User Interface
Physical Interface
III.
IV.
Page 14 of 83
The user interface uses a simple protocol Based entirely on SDR signals to make Read/Write
requests. This module is constructed primarily from FIFO16 primitives and is used to store the address
and data values for Read/Write operations before and after execution.
The Read/Write state machine is responsible for monitoring the status of the First in first out
(FIFO) within the user interface module, coordinating the flow of data between the user interface and
physical interface, and initializing the actual Read/Write commands to the external memory device. It
ensures execution of Read/Write operations with minimal latency in a concurrent manner as per the
requirements of the QDR II memory specification.
The physical interface is responsible for generating the proper timing relationships and DDR
signaling to communicate with the external memory device in a manner that conforms to its command
protocol and timing requirements.
The delay calibration state machine is an integral component of the direct-clocking methodology
used to achieve maximum performance while greatly simplifying the task of read data capture inside the
FPGA. The delay calibration state machine leverages this unique capability to adjust the timing of the
read data returning from the memory device so that it can be synchronized directly to the global FPGA
system clock without any complex local-clocking or data recapture techniques.
The reference diagram of QDR-II is shown below as follows.
User Interface
Read/Write State
Machine
USER_CLK0
USER_RESET
USER_W_n
USER_R_n
USER_QEN_n
USER_AD_WR
USER_AD_RD
USER_BW_n
USER_DWL
USER_DWH
USER_QRL
USER_QRH
Physical Interface
USER_CLK0
USER_CLK270
USER_RESET
FIFO Status
Read/Write Control
Address path
Write Path
Read Path
USER_WR_FULL
USER_RD_FULL
USER_QR_EMPTY
QDR_W_n
QDR_R_n
QDR_SA
QDR_BW_n
QDR_D
QDR_CD
QDR_D
QDR_K
QDR_K_n
Delay
CLK_DIV4
Calibration
State Machine
Page 15 of 83
QDRII
Memory
Device
the memory device. Assuming there is a pending Read request, the state machine then transitions to the
Read state where the internal RD_INIT_n strobe is activated. This strobe pulls the Read address from
INIT
USER_RESET
START_CAL=1
DLY_CAL_DONE
(FIFO_WR_EMPTY, FIFO_RD_EMPTY) |
(FIFO_WR_EMPTY FIFO _ RD _ EMPTY ,FIFO_QR_FULL)
IDLE
FIFO_WR_EMPTY
FIFO_WR_EMPTY
FIFO _ QR _ FULL )
FIFO_RD_EMPTY | FIFO_QR_FULL
FIFO_WR_EMPTY
WRITE
WR_INIT_n = 0
READ
RD_INIT_n=0
FIFO _ RD _ EMPTY
FIFO _ QR _ FULL
DLY_CAL_DONE
USER_RESET
INIT
START_CAL=1
DLY_CAL_DONE
(FIFO_WR_EMPTY FIFO_RD_EMPTY) |
(FIFO_WR_EMPTY FIFO_QR_FULL
IDLE
/FIFO_WR_EMPTY |
(/FIFO_RD_EMPTY /FIFO_QR_FULL)
(FIFO_WR_EMPTY FIFO_RD_EMPTY) |
FIFO_WR_EMPTY. FIFO_QR_FULL
READ/WRITE
WR_INIT_n = 0?
rd_init_n = 0?
Page 17 of 83
/FIFO_WR_EMPTY) |
(/FIFO_RD_EMPTY /FIFO_QR_FULL)
the FIFOs and launches an external QDR_R_n strobe to the memory device. Capture of the return values
in the Read data FIFOs also occurs as a result of this process.
The Read/Write slate machine continuously monitors the user interface FIFO status signals to
determine if there are any pending Read/Write requests. A continuous flow of concurrent Read/Write
requests causes the state machine to simply alternate between the Read and Write states, ensuring
properly interleaved requests to the external memory. A stream of Write requests results in alternating
Idle and Write stales. While a stream of Read requests similarly alternates between Idle and Read slates.
The operation of a 2-word burst state machine is quite similar to the 4-word burst slate machine,
with the exception that a single READ_IVRITE state manages the Read and Write requests to the
memory. All 2-word burst QDR 11 memory devices allow Read and Write requests to occur on the same
clock cycle, allowing these operations to be initialed from the same state.
The state diagram for 4 word burst read/write and 2 word burst read/write are shown below.
2.3.3 Physical Interface
The Physical Interface of the QDRII reference design generates the actual I/O signaling and
timing relationships for communication of Read/Write commands to the external memory device,
including the DDR data signals. It provides the necessary timing margins and 1/0 signaling standards
required to meet the overall design performance specifications.
interfaces. Each address location is associated with four 8-bit words (CY7CI5JlVI8) or 9-bit words
(CY7CI526VI8) or I8-bit words (CY7CI5I3VI8) or 36-bit words (CY7CI5I5VI8) that burst sequentially
into or out of the device. Since data can be transferred into and out of the device one very rising edge of
both input clocks (K and K and C and C), memory bandwidth is maximized while simplifying system
design by eliminating bus "turn-around"
Fig. 2.4:
Logic
diagram of CY7C1515V18
Depth expansion is accomplished with Port Selects for each port, Port selects allow each port to
operate independently. All synchronous inputs pass through input registers con/rolled by the K or K
input docks. All do/a outputs pass through output registers controlled by the C or C (or K or K in a
single clock domain) input docks, Writes ore conducted with on-chip synchronous self-timed write
circuitry.
2.4.1 Pin Definitions
Pin Name
D[x: 0]
I/O
InputSynchronous
InputSynchronous
,
InputSynchronous
Pin Description
Data input signals, sampled on the rising edge of K and
K clocks during valid write operations.
CY7C 1511V18-D[7:0]
CY7C 1526V18-D[8:0]
CY7C1513V18-D[17:0]
CY7CI515V18-D[35:0]
Write Port Select, active LOW. Sampled on the rising edge
of the K clock. When asserted active, a write operation is
initiated. Disserting will deselect the Write port. Deselecting
tile Write port Will cause D[x: 0] to be ignored.
Nibbl_Write Select 0, 1-active LOW. (CY7C1511V18
Only) Sampled on the rising edge of the K and K clocks
Page 19 of 83
InputSynchronous
CY7C1526V18-
controls D[8:0]
CY7C1513V18-
controls
controls D[8:0],
controls
D[17:9]
CY7C1515V18D[17:9],
InputSynchronous
Q[x: 0]
OutputsSynchronous
RPS
InputSynchronous
controls D[26:18],
controls D[35:27]
Input-Clock
Input-Clock
Input-Clock
CQ
CQ
ZQ
DOFF
Input-Clock
Echo Clock
Echo Clock
Input
Input
TDO
Output
TCK
Input
TDI
Input
TMS
Input
NC
N/A
Vss/144M
Input
Vss/288M
Input
Vref
Input Reference
VDD
Power Supply
Read address register. Following the next K clock rise, the corresponding lowest order 18bit word of
data is driven onto the Q [17:0] using C as the output timing reference. On the subsequent rising edge of
C the next 18-bit data word is driven onto the Q [17:0]. This process continues until all four 18-bit data
words have been driven out onto Q [17:0]. The requested data will be valid 0.45 ns from the rising edge
of the output clock (C or C or (K or K when in single-clock mode)). In order to maintain the internal
logic, each read access must be allowed to complete. Each Read access consists of four 18-bit data
words and takes 2 clock cycles to complete. Therefore, Read accesses to the device cannot be initiated
on two consecutive clock rises. The internal logic of the device will ignore the second Read request
Read accesses can be initiated one very other K clock rise. Doing so will pipeline the data flow such that
data is transferred out of the device on every rising edge of the output clocks (C and C or K and K when
in single-clock mode).
When the read port is deselected, the CY7CI5I3VI8 will first complete the pending read
transactions. Synchronous internal circuitries will automatically tri-state the outputs following the next
rising edge of the Positive Output Clock (C). This will allow for a seamless transition between devices
without the insertion of wait states in a depth expanded memory.
2.5.2 Write Operations
Write operations are initiated by asserting WPS active at the rising edge of the Positive input
Clock (K). On the following K clock rise the data presented to D[I7:0] is latched and stored into the
lower I8-bit Write Data register, provided BWS[1:0] are both asserted active. On the subsequent rising
edge of the Negative Input Clock (K) the information presented to D [I7:0] also stored into the Write
Data Register, provided BWS [1:0] are both asserted active. This process continues for one more cycle
until four I8-bit 'words (a total of 72 bits) of data are stored in the SRAM. The 72 bits of data are then
written into the memory array at the specified location. Therefore, Write accesses to the device cannot
be initiated on two consecutive K clock rises. The internal logic of the device will ignore the second
Write request. Write accesses can be initiated on every other rising edge of the Positive Input Clock (K).
Doing so will pipeline the data flow such that 18bits of data can be transferred into the device on every
rising edge of the input clocks (K and K).
When deselected, the write port will ignore all inputs after the pending Write operations have
been completed.
2.5.3 Byte Write Operations
Byte Write operations are supported by the CY7CI 513VI8. A write operation is initiated as
described in the Write Operation section above. The bytes that are written are determined by BWS0 and
BWS1, which are sampled with each set of 18-bitdata words. Asserting the appropriate Byte Write
Select input during the data portion of a write will allow the data being presented to be latched and
Page 23 of 83
written into the device. Deasserting the Byte Write Select input during the data portion of a write 'will
allow the data stored in the device for that byte to remain unaltered. This feature can be used to simplify
Read/Modify/Write operations to a Byte Write operation. Even CY7C1515V18 also supports byte write
which is determined by BWS0, BWS1, BWS2, and BWS3.
2.5.4 Single Clock Mode
The CY7CJ513VI8 can be used with a single clock that controls both the input and output
registers. In this mode the device will recognize only a single pair of input clocks (K and K) that
controls both the input and output registers. This operation is identical to the operation if the device had
zero skew between the K/K and C/C clocks. All timing parameters remain the same in this mode. To use
this mode of operation, the user must tie C and C HIGH at power on. This function is a strap option and
not alterable during device operation.
2.5.4 Concurrent Transactions
The Read and Write ports on the CY7Cl5J3V18 operate completely independently of one
another. Since each port latches the address inputs on different clock edges, the user can Read or Write
to any location, regardless of the transaction on the other port. If the ports access the same location when
a read follows a write in successive clock cycles, the SRAM will deliver the most recent information
associated with the specified address location. This includes forwarding data from a Write cycle that was
initiated on the previous K clock rise.
Read accesses and Write access must be scheduled such that one transaction is initiated on any
clock cycle. If both ports are selected on the same K clock rise, the arbitration depends on the previous
state of the SRAM If both ports were deselected, the Read port will take priority, If a Read was initiated
on the previous cycle, the Write port will assume priority (since Read operations cannot be initiated on
consecutive cycles). If a Write was initiated on the previous cycle, the Read port will assume priority
(since Write operations cannot be initiated on consecutive cycles). Therefore, asserting both ports selects
active from a deselected state will result in alternating Read/Write operations being initiated, with the
first access being a Read
2.5.6 Depth Expansion
The CY7C1513V18 has a Port Select input for each port. This allows for easy depth expansion.
Both Port Selects are sampled on the rising edge of the Positive Input Clock only (K).Each port select
input can deselect the specified port. Deselecting a port will not affect the other port. All pending
transactions (Read and Write) will be completed prior to the device being deselected.
2.5.7 Programmable Impedance
Page 24 of 83
An external resistor, RQ, must be connected between the ZQ pin on the SRAM and VSS to allow
the SRAM to adjust its output driver impedance. The value of RQ must be 5X the value of the intended
line impedance driven by the SRAM, The allowable range of RQ to guarantee impedance matching with
a tolerance of 15% is between 175 and 350, with VDDQ = 1.5V. The output impedance is
adjusted every I024 cycles upon power up to account for drifts in supply voltage and temperature.
Page 25 of 83
CHAPTER 3
XILINX AND MODEL SIM
3.1 Xilinx Overview
The Integrated Software Environment (ISE'"'V is the Xilinx design software suite that allows
you to take your design from design entry through Xilinx device programming. The ISE Project
Navigator manages and processes your design through the following steps in the ISE design flow.
Tool bar
Sources window
Processes window
Workspace
Transcript window
From the figure below on the top left is the Sources window which hierarchically displays the
elements included in the project. Beneath the Sources window is the Processes window, which displays
Page 26 of 83
available processes for the currently selected source. The third window at the bottom of the Project
Navigator is the Transcript window which displays status messages, errors, and warnings and also
contains interactive tabs for Tcl scripting and the Find in Files function. The fourth window to the right
is a multi-document interface (MDI) window referred to as the Workspace. It enables you to view html
reports, ASCII text files, schematics, and simulation waveforms.
3.2.1 Project Navigator Main Window
Page 27 of 83
Design entry is the first step in the ISE design flow. During design entry, one creates the source
files based on design objectives. Also we can create the top-level design file using a Hardware
Description Language (HDL), such as VHDL, Verilog, or ABEL, or using a schematic. Use multiple
formats for the lower-level source files in the design.
If we are working with a synthesized EDIF or NGCINGO file, then skip design entry and
synthesis and start with the implementation process.
3.3.2 Synthesis
After design entry and optional simulation, run synthesis. During this step, VHDL, Verilog, or
mixed language designs become netlist files that are accepted as input to the implementation step.
3.3.3 Implementation
After synthesis, run design implementation, which converts the logical design into a physical file
format that can be downloaded to the selected target device. From Project Navigator, run the
implementation process in one step, or run each of the implementation processes separately.
Implementation processes V(fly depending on whether we are targeting a Field Programmable Gate
Array (FPGA) or a Complex Programmable Logic Device (CPLD).
3.3.4 Verification
It verifies the functionality of the design at several points in the design flow. Then we can use
simulator software to verify the functionality and timing of the design or a portion of design. The
simulator interprets VHDL or Verilog code into circuit functionality and displays logical results of the
described HDL to determine correct circuit operation. Simulation allows creating and verifying complex
functions in a relatively small amount of time and also run in circuit verification after programming
device.
3.3.5 Device Configuration
After generating a programming file, configure the device. During configuration, generate
configuration files and download the programming files from a host computer to a Xilinx device.
IMPACT tool Overview
IMPACT, is a tool featuring batch and graphical user interface (GUI) operations, allows you to
perform the following functions: Device Configuration and File Generation.
The Device Configuration enables you to directly configure Xilinx FPGAs or program Xilinx
CPLDs and PROMs with the Xilinx cables (MutiPRO Desktop Tool, Parallel Cable IV, or Platform
Cable USB) in various modes. In the Boundary-Scan mode, Xilinx FPGAs, CPLDs, and PROMs com be
configured or programmed. In the Slave Serial or Select MAP configuration modes only FPGAs can be
Page 28 of 83
configured directly. In the Desktop Configuration mode Xilinx CPLDs or PROMs can be programmed.
In the Direct SPI Configuration mode select SPJ serial flash (STMicro: M25P, M25PE, M45PE or
Atmel: AT45DB) can be programmed.
File Generation enables you to create the following types of programming files; System ACE CF,
PROM, SVF, STAPL, and XSVF files.
IMPACT also enables us to do the following:
1. Read back and verify design configuration data
2. Debug configuration problems
3. Execute SVF and XSVF files
Design Verification
Behavioral
Simulation
Design
Synthesis
Functional
Simulation
Design
Implementation
Static Timing
Analysis
Back
Annotation
Xilinx Device
Programming
Timing
Simulation
In-circuit
Verification
Page 29 of 83
2.
3.
Fir filters
4.
FFTs
5.
Connectivity and networking interfaces (Ethernet, SPJ-4.2, Rapid IO, CAN and PCI Express).
3.4.1 Memory Interface Generator
This Memory Interface Generator (AIIG) is a simple menu driven tool to generate advanced
memory interfaces. DDR2 SDRAM, DDR SDRAM DDRII SRANM, QDRII SRAM, and RLDRAM II
are supported. This tool generates HDL and pin placement constraints that will help us design our
application
3.4.2 Memory Interface Generator
Interfacing QDRII SRAM with MIG
The Figure below shows a top-level block diagram a/the QDRII memory controller. One side of
the QDRII memory controller connects to the user interface denoted as Block Application. The other
side of the controller interfaces to QDRII memory. The memory interface data width is selectable.
QDR-II Memory
Controller
QDR-II Memory
Block Application
Page 30 of 83
Data is double-pumped to QDRJJ SRAM on both the positive and the negative clock edges. The
HSTL_18 Class I/O standard is used for the data, address, and control signals. QDR-II SRAM interfaces
are source-synchronous and double data rate like DDR SDRAM interfaces. The key advantage to QDRII devices is they have separate data buses for reads and writes to SRAM. These rams are faster and
more protected from error and faults.
Interface model
The memory interface is layered to simplify the design and make the design modular-The Figure
below shows the layered memory interface in the QDRII memory controller-The three layers are the
application layer, the implementation layer, and the physical layer
The application layer comprises the user interface, which initiates memory
writes and reads by writing data and memory addresses to the User Interface
FIFOs. The implementation layer comprises the infrastructure, datapath, and
control logic.
1. The infrastructure logic consists of the DCM and reset logic generation circuitry.
2. The datapath logic consists of the calibration logic by which the data from the
memory component is captured using the FPGA clock.
3. The control logic determines the type of data transfer that is, read/write with
the memory component, depending on the User Interface FIFOs status signals.
User Interface
Implementation Layer
Infrastructure
Data path
Control
Physical Layer
Fig. 3.5: Interface layering model
The physical layer comprises the I/O elements of the FPGA. The controller
communicates with the memory component using this layer. The I/O elements
(such as IDDRs, ODDRs, and IDELAY elements) are associated with this layer.
Hierarchy
Page 31 of 83
The above figure shows the hierarchical structure of the QDRII SRAM design generated by MIG
with a test bench and a DCM. The modules are classified as follows:
Design modules
1. Test bench modules
2. Clocks and reset generation modules parameters selected from MIG.
Design clocks and resets are generated in the infrastructure_top module. When the use DCM
option is checked in MIG, a DCM primitive and the necessary clock buffers are instantiated in the
infrastructure_top module. The inputs to this module are the differential design clock and a 200 MHz
differential clock required for the IDELAYCTRL module. A user reset is also input to this module.
Using the input clocks and reset signals, the system clocks and the system resets used in the design are
generated in this module. When the Use DCM option is unchecked in MIG, the infrastructure _top
module does not have the DCM and the corresponding clock buffer instantiations; therefore, the system
operates on the user-provided clocks. The system reset is generated in the infrastructure top module
using the DCM_LOCK signal and the ready signal of the IDELAYCTRL element.
Page 32 of 83
1.
Insert Chip scope Pro cores in the design using the CORE Generator or Core Inserter.
2.
3.
2.5.3
on-chip debugging: integrated logic analyzer (ILA), integrated bus analyzer (IBA), and virtual
input/output (VIO) low-profile software cores. These cores allow viewing internal signals and
nodes in FPGA, including the IBM Core Connect" processor local bus (PLB) that supports the
IBM PowerPC TM 405. Following are the Chip scope Pro cores and their functions:
1. ICON
The Integrated Controller (ICON) core provides the communication between the
embedded ILA, IBA, and VIO cores and the computer running the Chip scope Pro
Analyzer software.
2.
ILA
The ILA core is a customizable logic analyzer core that can be used to monitor the
internal signals in design. Because the ILA core is synchronous to the design being
monitored, all design clock constraints applied to design are also applied to the
components inside the ILA core.
3. ATC2
The Agilent Trace Core 2 (ATC2) is a customizable logic analyzer core. This is similar to
the ILA core but does not use on-chip Block RAM resources to store captured trace data. The
ATC2 core synchronizes Chip scope Pro to the Agilent FPGA dynamic probe technology,
delivering the first integrated application for FPGA debug with logic analyzers.
4. VIO
The virtual input/output core is a customizable core that can both monitor and
drive internal FPGA signals in real lime. Unlike the ILA and IBA cores, the VIO core
does not require on chip RAM.
Page 35 of 83
waveform pane
Page 36 of 83
cursor pane
Page 37 of 83
CHAPTER 4
HARDWARE BOARD DESCRIPTION
4.1 Board Overview
The hardware on which we are working is a subsystem on single board which is used in the
processing of signals intercepted. II consists of Xilinx" FPGAs, Optical transceivers, cPCI Interface,
Memories (DDR SDRAM and QDRII SRAM) and Ethernet Interface.
4.1.1 Requirement of this Board
Before designing this board as many as 16 independent boards were used for the purpose. But
due to the advances in VLSI technology, all these are now integrated onto a single board. This is highly
advantageous as the board thus developed is smaller in size and the speed of operation is faster.
IV.1.2 Board Block Diagram
36MB
QDR-II
SRAM
PPD Main
Address
Control &
Data
Virtex-4
XC4LX100
Virtex-II Pro
XC2VP7
De interleaved
PDW
cPCI Bridge
cPCI Backplane
Virtex-II
Pro
XC2VP7
128MB
DDR
SDRAM
8MB
Flash
Memory
10/100
Ethernet
PHY
PPD Rear
IO
Virtex-II Pro
XC2VP7
Module
Virtex-4
XC4LX100
PDW
Virtex-II Pro
XC2VP7
So, a memory is required to store the de-interleaved PD words outputted by the 1 st stage. For the
purpose of storage of PD Words de-interleaved in Ist level, dual port memory is used, which is required
to be independently accessed by the two processes. This type of a memory improves performance by
Page 40 of 83
reducing the memory access conflicts between the two levels, and thus Increases the speed of operation.
Due to high speed memory access requirements, Quad Data Rate SRAMs are used which have
independent read and write ports. This SRAM ideally suits the requirement as there are independent
ports for writing and reading.
Virtex-4
XC4LX100
Virtex-II Pro
XC2VP7
Interleaved
PD Data
SRAM
Interfacing Logic
in VHDL
SRAM
Virtex-4 Pro
PPD
S/W
Interface
QDR-II
Memory
Controller
Emitter
Processor
PLB i/f
V2 PRO
Page 42 of 83
QDR-II
Memory
CHAPTER 5
VHDL INTERFACE DESIGN AND STATE DIAGRAM
5.1
S/W
Interface
PPD
QDR-II
Memory
Controller
QDR-II
Memory
Emitter
Processor
PLB i/f
V2 PRO
Writing
PLB
Reading
The PowerPC 405 core accesses high speed and high performance system resources through
Processor Local Bus (PLB) interfaces on the instruction and data cache controllers. The PLB interfaces
provide separate 32-bit address and 64-bit data buses for the instruction and data sides.
The PLB supports read and write data transfers between master and slave devices equipped with
a PLB bus interface and connected through PLB signals. Bus architecture supports multiple master and
slave devices. Each PLB master is attached to the PLB through separate address, read-data, and writedata buses. PLB slaves are attached to the PLB through shared, but decoupled, address, read-data, and
write-data buses and a plurality of transfer control and status signals for each data bus.
Page 43 of 83
Invalid State
del_cal=1
LT_PDW_2_3
IDLE_WR
hw_fifo_empty=0
and
user_wr_full=0
LT_PDW_0_1
WRFIFO_RD
Fig. 5.2: Write cycle state diagram
reset = 1
INIT_RD
dly_calc=1
IDLE_RD
proc_rd=0
proc_rd=1
ACK_W0_GE
N
proc_rd=1
proc_rd=1
LATCH_RD
_ADDR
Page 44 of 83
proc_addr(1 downto)=10
proc_rd=0
proc_rd=0
proc_rd=0
user_rd_full=0
LT_W2
LATCH_EPW_
2_3
test_w_n =1
ACK_W3_GE
N
ACK_W1_GE
N
RD_ADDR
_Wr
ACK_W2_GE
N
LATCH_EP
W_0_1
user_qr_empty=0
WAIT_QR
_EMPTY
CHAPTER 6
TEST RESULTS, CONCLUSION AND FUTURE SCOPE OF
WORK
6.1 Simulation Results in ModelSim
6.1.1 Write Cycle
The PPD logic and the processor PLB interface operate with external clock as reference, whereas
the QDRII SRAM Memory Controller operates at 166MHz which is the operating frequency of QDRII
SRAM device. The reset signal used is synchronous with respect to QDRII SRAM reference clock.
A signal with name, dly_cal_done is an indicator signal which will indicate when the QDRII
SRAM device calibration is completed and is ready for access.
The logic uses a FIFO interface to store the processed PD Word which are written with a
minimum time of 200ns, which are to be written into QDRII SRAM device. We simulated this
Page 45 of 83
requirement by generating a signal wr_pulse every 200ns. We employed a counter generate the 128 bit
PD Word to be written the hardware address and hardware data into 5 FIFOs, 1for address and 4 for
data(32 bit each).
The QDRII SRAM operates at a clock rate of 166 Mhz. So, we take a user_clk equals to
166MHz. The write state machine remains idle till dly_cal_done = 1 condition has occurred. Once the
data is written into the FIFOs, the hw_fifo_empty signal goes low signifying that there is a data present
in the PPD FIFO interface. As it goes low, at the next rising edge of the user clock, the state machine
mves into the wrfifo_rd state and hardware read, hw_rd becomes high. The data and address are now
read from the FIFOSs into qdr_wrdata and hw_addr_out respectively. qdr_wrdata which is a data
output of FIFOs is a 128 bit data line. The state machine next moves into lt_pdw_0_1 and subsequently
into lt_pdw_2_3 states. user_w_n_i is an active low signal to latch the 128bit PD Word. lt_pdw is a 2
bit vector which is 01 for lower 64 bit data and 10 for higher 64 bit data. user_dwl and user_dwh
are two 32 bit data lines. Of the 64 bit data, the lower 32 bits are latched to user_dwl and higher 32 bits
are latched to user_dwh. test_w_n_iis the active low signal used to inhabit generation of user_r_n_i
active low signal for read operation at the same time of user_w_n_i signal generation.
Page 46 of 83
The read cycle is initiated by the Processor Local Bus (PLB). This bus is a 32-bit data bus. A
Read signal is generated every time a read operation is initiated by embedded PowerPc processor of
Virtex-II FPGA. These read request are simulated using VHDL and implemented in vertex-4 FPGA.
The read requests are generated every 2 microseconds. The proc_rd signal goes high along with
address proc_addr, including the PLB for reading data from the QDR-II SRAM. Once the user
interface receives the address from PLB, it starts reading the data from the specified location onto its bus
(user interface). Once the data is present in user interface bus then it is latched onto the PLB data once
the fifo_empty signal goes low. Then an acknowledgement signal is generated by the SRAM
suggesting that the data has been latched onto the PLB bus. Since the PLB bus is 32 bit data bus, unlike
in write cycle, only one word at a time is latched onto the PLB bus. As there are four PD Words (PDW)
to be read, it takes 4 read cycles to read them. When the user_qr_empty signal low the first two words
(W0 and W1) are ready present on user_qrl and user_qrh respectively. This condition is known as
first word fall through. So, the word W0 and W1 are latched onto the 128 bit qdr_rddata bus when
user_qr_empty signal goes low. In the next clock cycle, W2 and W3 are latched onto qdr_rddata bus,
and W0 is latched on the PLB data bus. PD Words W1, W2 and W3 are then latched in the next read
request cycles onto the PLB data bus, from qdr_rddata bus. An acknowledgement signal rd_ack is
generated by read state machine every time the data is latched onto the PLB bus.
Page 49 of 83
Page 52 of 83
6.3 Conclusion
The VHDL code is written for the interface to control the SRAM memory access using Virtex-4
FPGA. The same has been verified using Modelsim simulation graphs and chip scope pro hardware
simulation in XilinxTM ISE 10.1. The result have been studied and verified. This interfacing logic enables
us to access SRAM with the highest possible speed which supports writing of continuous data input
stream at a rate of 640mbps. This interface logic can be utilized for interfacing QDR memory devices of
upcoming generation with improved technology. The interface enables us to attach to the PLB interface
of embedded PowerPC processor of Virtex family FPGAs with ease.
Page 56 of 83
: in std_logic;
: in std_logic;
: in std_logic;
--PPD IF-------------------------clk_100
: in std_logic;
hw_wr
: in std_logic;
hw_data
: in std_logic_vector( 127 downto 0 );
hw_addr
: in std_logic_vector( 20 downto 0 );
al_full
: out std_logic;
proc_rd
proc_addr
proc_data
rd_ack
: in std_logic;
: in std_logic_vector( 22 downto 0 );
: out std_logic_vector( 31 downto 0 );
: out std_logic;
user_w_n
: out std_logic;
user_r_n
: out std_logic;
user_ad_wr : outstd_logic_vector((ADDR_WIDTH_4D-1) downto 0);
user_bwl_n : out std_logic_vector((BW_WIDTH-1) downto 0);
user_bwh_n : out std_logic_vector((BW_WIDTH-1) downto 0);
user_dwl
: out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_dwh
: out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_ad_rd
: out std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
user_qen_n : out std_logic;
compare_error : out std_logic;
user_wr_full : in std_logic;
user_rd_full : in std_logic;
user_qrl
: in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_qrh
: in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);
user_qr_empty
: in std_logic
);
end qdr_dpif;
architecture Behavioral of qdr_dpif is
Page 57 of 83
component synchro
port(
reset
: in std_logic;
clock
: in std_logic;
sig_in
: in std_logic;
sig_out : out std_logic
);
end component;
signal reset_r
: std_logic;
: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );
: write_state_type;
: write_state_type;
signal hw_rd
signal test_w_n_i
signal user_w_n_i
signal lt_hwdata
: std_logic;
: std_logic;
: std_logic;
: std_logic_vector(1 downto 0);
ACK_W2_GEN,
LT_W3,
ACK_W3_GEN
);
signal read_cs
signal read_ns
: read_state_type;
: read_state_type;
signal lt_rd_ad
: std_logic;
signal user_r_n_i
: std_logic;
signal user_qen_n_i : std_logic;
signal lt_q_0_1
signal lt_q_2_3
signal lt_word
: std_logic;
: std_logic;
: std_logic_vector( 3 downto 0 );
signal proc_rd_sync
signal proc_addr_sync
: std_logic;
: std_logic_vector( 22 downto 0 );
signal proc_data_i
signal rd_ack_i
: std_logic_vector( 31 downto 0 );
: std_logic;
signal qdr_rddata
: std_logic_vector(127 downto 0 );
signal byte_enb
: std_logic_vector(7 downto 0);
signal user_ad_rd_i
: std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
signal user_ad_wr_i
: std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
begin
compare_error <= '0';
user_w_n <= user_w_n_i;
user_r_n <= user_r_n_i;
user_qen_n <= user_qen_n_i;
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
proc_data
<=
proc_data_i;
rd_ack
<=
rd_ack_i;
end if;
end process;
byte_enb <= "00000000";
user_bwl_n <= byte_enb((BW_WIDTH-1) downto 0);
user_bwh_n <= byte_enb((BW_WIDTH-1) downto 0);
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
reset_r <= user_reset;
Page 59 of 83
end if;
end process;
--------------WR_SM---------------------------------------------------------------------------------------process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
write_cs <= INIT_WR;
else
write_cs <= write_ns;
end if;
end if;
end process;
process (write_cs, dly_cal_done, user_wr_full, hw_fifo_empty )
begin
write_ns <= write_cs;
case write_cs is
when INIT_WR =>
if(dly_cal_done = '1') then
write_ns <= IDLE_WR;
end if;
when IDLE_WR =>
if( user_wr_full = '0' and hw_fifo_empty = '0' ) then
write_ns <= WRFIFO_RD;
end if;
when WRFIFO_RD =>
write_ns <= LT_PDW_0_1;
when LT_PDW_0_1 =>
write_ns <= LT_PDW_2_3;
when LT_PDW_2_3 =>
write_ns <= IDLE_WR;
when others =>
write_ns <= INIT_WR;
end case;
end process;
with write_cs select
hw_rd
<= '1' when WRFIFO_RD,
'0' when others;
with write_cs select
test_w_n_i
<= '0' when LT_PDW_0_1,
'1' when others;
Page 60 of 83
end if;
when LT_W1 =>
read_ns <= ACK_W1_GEN;
when ACK_W1_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
end if;
when LT_W2 =>
read_ns <= ACK_W2_GEN;
when ACK_W2_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
end if;
when LT_W3 =>
read_ns <= ACK_W3_GEN;
when ACK_W3_GEN =>
if proc_rd_sync = '0' then
read_ns <= IDLE_RD;
end if;
when others =>
read_ns <= INIT_RD;
end case;
end process;
with read_cs select
lt_rd_ad
with read_cs select
user_r_n_i
when LT_EPW_2_3_W0,
Page 63 of 83
"0010"
"0100"
"1000"
"0000"
when LT_W1,
when LT_W2,
when LT_W3,
when others;
<= user_ad_rd_i;
<= user_ad_wr_i;
Page 64 of 83
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
user_ad_wr_i <= (others => '0');
elsif( test_w_n_i = '0' ) then
user_ad_wr_i <= hw_addr_out((ADDR_WIDTH_4D-1) downto 0);
end if;
end if;
end process;
process (user_clk0)
begin
if(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') then
user_ad_rd_i <= (others => '0');
elsif lt_rd_ad = '1' then
user_ad_rd_i <= proc_addr_sync((ADDR_WIDTH_4D+1) downto2);
end if;
end if;
end process;
------------------------------------------------------------------------------------------------------------------------PPD_FIFO_IF---------------------------------------------------------------------------------PPD_DATA : for I in 3 downto 0 generate
begin
DATA_FIFO : FIFO16
generic map
(
FIRST_WORD_FALL_THROUGH => false,
ALMOST_FULL_OFFSET => X"00F",
DATA_WIDTH
=> 36
)
port map (
DI
=> hw_data(I*32+31 downto I*32),
DIP
=> byte_enb(3 downto 0),
RDCLK
=> user_clk0,
RDEN
=> hw_rd,
RST
=> reset_r,
WRCLK
=> clk_100,
WREN
=> hw_wr,
ALMOSTEMPTY => open,
ALMOSTFULL => data_al_full(I),
DO
=> qdr_wrdata(I*32+31 downto I*32),
DOP
=> open,
EMPTY
=> hw_data_empty(I),
FULL
=> open,
RDCOUNT => open,
RDERR
=> open,
WRCOUNT => open,
WRERR
=> open
);
Page 65 of 83
<=
<=
hw_addr;
(others => '0');
PPD_ADDR_FIFO : FIFO16
generic map
(
FIRST_WORD_FALL_THROUGH => false,
ALMOST_FULL_OFFSET => X"00F",
DATA_WIDTH
=> 36
)
port map (
DI
=> hw_addr_in,
DIP
=> byte_enb(3 downto 0),
RDCLK
=> user_clk0,
RDEN
=> hw_rd,
RST
=> reset_r,
WRCLK
=> clk_100,
WREN
=> hw_wr,
ALMOSTEMPTY => open,
ALMOSTFULL => addr_al_full,
DO
=> hw_addr_out,
DOP
=> open,
EMPTY
=> hw_addr_empty,
FULL
=> open,
RDCOUNT => open,
RDERR
=> open,
WRCOUNT => open,
WRERR
=> open
);
al_full <=
entity hwdata_sim is
Port (
clk_100
reset
dly_cal_done
: in std_logic;
: in std_logic;
: in std_logic;
--debug
hw_test_led
: out std_logic;
hw_wr
hw_data
hw_addr
b0_full
: out std_logic;
: out std_logic_vector( 127 downto 0 );
: out std_logic_vector( 20 downto 0 );
: in std_logic
);
end hwdata_sim;
architecture Behavioral of hwdata_sim is
signal reset_r
: std_logic;
signal dly_cal_done_r : std_logic;
constant INIT
: std_logic_vector( 5 downto 0 ) := "000001";
constant IDLE
: std_logic_vector( 5 downto 0 ) := "000010";
constant WR_GEN
: std_logic_vector( 5 downto 0 ) := "000100";
constant DUMMY_ST
: std_logic_vector( 5 downto 0 ) := "010000";
constant INC_ADDR
: std_logic_vector( 5 downto 0 ) := "100000";
signal current_state
signal next_state
: std_logic_vector( 5 downto 0 );
: std_logic_vector( 5 downto 0 );
signal counter
signal wr_count
signal wr_pulse
: std_logic_vector( 29 downto 0 );
: std_logic_vector( 7 downto 0 );
: std_logic;
signal hw_data1
signal hw_data2
signal hw_data3
signal hw_data4
: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );
: std_logic_vector( 31 downto 0 );
signal hw_data_i
signal hw_addr_i
signal hw_wr_i
--debug
signal counter_dbg
signal hw_wr_dbg
signal del_1
constant zeroes_23
constant zeroes_30
: std_logic_vector(127 downto 0 );
: std_logic_vector( 20 downto 0 );
: std_logic;
: std_logic_vector( 29 downto 0 );
: std_logic;
: std_logic;
: std_logic_vector( 22 downto 0 ) := (others => '0');
: std_logic_vector( 29 downto 0 ) := (others => '0');
Page 67 of 83
constant zeroes_32
begin
hw_data
hw_addr
hw_wr
<=
<=
<=
hw_data_i;
hw_addr_i( 20 downto 0 );
hw_wr_i;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
reset_r
<= reset;
dly_cal_done_r
<=
dly_cal_done;
end if;
end process;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if reset_r = '1' or dly_cal_done_r = '0' or wr_pulse = '1' then
wr_count
<= (others => '0');
else
wr_count
<=
wr_count + '1';
end if;
end if;
end process;
wr_pulse
<=
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1') then
current_state <= INIT;
else
current_state <= next_state;
end if;
end if;
end process;
process (current_state, dly_cal_done, b0_full, wr_pulse )
begin
next_state <= current_state;
case current_state is
when INIT =>
if(dly_cal_done = '1') then
next_state <= IDLE;
end if;
when IDLE =>
if wr_pulse = '1' and b0_full = '0' then
Page 68 of 83
<=
current_state(2);
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1') then
counter
<= (others => '0');
elsif( current_state(5) = '1' ) then
counter
<=
counter + '1';
end if;
end if;
end process;
hw_addr_i
--hw_addr_i
<=
<=
hw_data1
hw_data2
hw_data3
hw_data4
hw_data_i
counter( 20 downto 0 );
counter(2 downto 0) & counter( 21 downto 3 );
<=
<=
<=
<=
<=
--debug----------process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
counter_dbg
<=
counter;
hw_wr_dbg <=
hw_wr_i;
end if;
end process;
del_1
<=
'1'
else
hw_test_led
<=
del_1 or hw_wr_dbg;
end Behavioral;
: in std_logic;
: in std_logic;
: in std_logic;
proc_rd
proc_addr
proc_data
: out std_logic;
: out std_logic_vector( 22 downto 0 );
: in std_logic_vector( 31 downto 0 );
test_done
rd_ack
: out std_logic;
: in std_logic
);
end procdata_sim;
architecture Behavioral of procdata_sim is
component synchro
port(
reset
: in std_logic;
clock
: in std_logic;
sig_in
: in std_logic;
sig_out : out std_logic
);
end component;
signal reset_r
: std_logic;
signal dly_cal_done_r
: std_logic;
signal rd_count
signal rd_pulse
constant INIT
constant WRP_WAIT
constant IDLE
constant RD_GEN
constant WAIT_RDACK
constant CNT_CHK
constant INC_ADDR
constant RST_BRST
signal current_state
signal next_state
signal word_count
signal counter
signal proc_addr_i
signal proc_rd_i
signal proc_data_sync
signal rd_ack_sync
signal proc_data_i
: std_logic_vector( 31 downto 0 );
: ackgen_state_type;
: ackgen_state_type;
signal qdr_mem_ack
: std_logic;
--debug
signal proc_rd_dbg
signal proc_addr_dbg
signal rd_ack_sync_dbg
signal proc_data_dbg
: std_logic;
: std_logic_vector(22 downto 0);
: std_logic;
: std_logic_vector(31 downto 0);
signal del_1
signal del_2
: std_logic;
: std_logic;
constant zeroes_23
constant zeroes_32
begin
proc_addr
proc_rd
<=
proc_addr_i( 22 downto 0 );
<=
proc_rd_i;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
reset_r
<= reset;
Page 71 of 83
dly_cal_done_r
<=
dly_cal_done;
end if;
end process;
RD_ACK_SYNC_INST:
synchro port map(
reset
=> reset_r,
clock
=> clk_100,
sig_in
=> rd_ack,
sig_out => rd_ack_sync
);
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1' or dly_cal_done_r = '0') then
ackgen_cs <= IDLE_ACK;
else
ackgen_cs <= ackgen_ns;
end if;
end if;
end process;
process (ackgen_cs, proc_rd_i, rd_ack_sync )
begin
ackgen_ns <= ackgen_cs;
case ackgen_cs is
when IDLE_ACK =>
if proc_rd_i = '1' AND rd_ack_sync = '0' then
ackgen_ns <= WAIT_FOR_ACK;
end if;
when WAIT_FOR_ACK =>
if rd_ack_sync = '1' then
ackgen_ns <= ACK_GEN;
end if;
when ACK_GEN =>
if proc_rd_i = '0' then
ackgen_ns <= IDLE_ACK;
end if;
when others =>
ackgen_ns <= IDLE_ACK;
end case;
end process;
with ackgen_cs select
qdr_mem_ack <= '1' when ACK_GEN,
'0' when others;
PROC_DATA_SYNC_GEN: for i in 31 downto 0 generate
Page 72 of 83
PROC_DATA_SYNC_INST:
synchro port map(
reset
=> reset_r,
clock
=> clk_100,
sig_in
=> proc_data(i),
sig_out => proc_data_sync(i)
);
end generate PROC_DATA_SYNC_GEN;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if reset_r = '1' or dly_cal_done_r = '0' or rd_pulse = '1' then
rd_count
<= (others => '0');
else
rd_count
<=
rd_count + '1';
end if;
end if;
end process;
rd_pulse
<=
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1') then
current_state <= INIT;
else
current_state <= next_state;
end if;
end if;
end process;
process (current_state, dly_cal_done_r, rd_pulse, qdr_mem_ack, word_count )
begin
next_state <= current_state;
case current_state is
when INIT =>
if(dly_cal_done_r = '1') then
next_state <= IDLE;
end if;
when IDLE =>
if rd_pulse = '1' then
next_state <= RD_GEN;
end if;
when RD_GEN =>
next_state <= WAIT_RDACK;
when WAIT_RDACK =>
Page 73 of 83
<=
current_state(3) or current_state(4);
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if( reset_r = '1' or current_state(7) = '1' ) then
word_count <= "00001";
elsif( current_state(6) = '1' ) then
word_count <=
word_count + '1';
end if;
end if;
end process;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if(reset_r = '1' ) then
counter
<= (others => '0');
elsif( current_state(6) = '1' or current_state(7) = '1' ) then
counter
<=
counter + '1';
end if;
end if;
end process;
proc_addr_i <=
--proc_addr_i <=
counter( 22 downto 0 );
counter(3 downto 2) & counter( 22 downto 4 ) & counter(1 downto 0);
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
if reset_r = '1' then
Page 74 of 83
proc_data_i
elsif proc_rd_i = '1' then
proc_data_i
proc_data_sync;
end if;
end if;
end process;
process (clk_100)
begin
if(clk_100'event and clk_100 = '1') then
proc_rd_dbg
<=
proc_rd_i;
proc_addr_dbg
<=
proc_addr_i;
rd_ack_sync_dbg
<=
qdr_mem_ack;
proc_data_dbg
<=
proc_data_i;
end if;
end process;
del_1
<=
'1'
del_2
<=
'1'
test_done
<=
end Behavioral;
Page 75 of 83
4) Filename: qdr_sram_main_0.vhd
Purpose: Top level example design incorporating QDRII Memory Controller module, an example
Clock generation module, and Reset logic.
5) Filename: qdr_sram_top_0.vhd
Purpose: Top level module for QDR-II memory controller design. This is the main module that
should be instantiated into a new FPGA design (along with all sub-modules) to implement a QDRII
interface.
6) Filename: qdr_sram_user_interface_0.vhd
Purpose: Responsible for storing the Read/Write requests made by the user design.
Instantiates, the
FIFOs for Read and Write address, data, and control storage
7) Filename: qdr_sramJd_user_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design Instantiates the FIFOs
for Read address, data, and control storage.
8) Filename: qdr_sramJd_addr_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design. Instantiates the FIFOs
for Read address and control storage.
9) Filename: qdr_sramJd_data_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design. Instantiates, the FIFOs
for Read data storage.
Page 76 of 83
Monitors Read/Write queue status from User Interface FIFOs and generates strobe
This module implements a state machine for reading back values from read data FIFO'S
and comparing the values generated in test bench and also serves as an error detection module to make
sure that the data returning from the memory is same as the data written to it.
36) Filename: qdr_sram_data_gen_0.vhd
Purpose: This module implements a data generator that generates data for Read and Write requests
to the QDR II memory device
37) Filename: qdr_sram_addr_gen_0.vhd
Purpose: The module is a part of internal test bench It generates addresses for both read and
write.
Page 79 of 83
hw_addr_23_2
hw_data1
hw_data2
hw_data3 hw_data4
1
2
3
4
5
6
7
8
9
10
000000
000001
000002
000003
000004
000005
000006
000007
000008
000009
00000000
00000004
00000008
0000000C
00000010
00000014
00000018
0000001C
00000020
00000020
00000001
00000005
00000009
0000000D
00000011
00000015
00000019
00000000
00000021
00000025
00000002
00000006
0000000A
0000000E
00000012
00000016
0000001A
0000001E
00000022
00000026
00000003
00000007
0000000B
0000000F
00000013
00000017
0000001B
0000001F
00000023
00000027
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
00000A
00000B
00000C
00000D
00000E
00000F
000010
000011
000012
000013
000014
000015
000016
000017
000018
000019
00001A
00001B
00001C
00001D
00001E
00001F
000020
000021
000022
000023
000024
000025
000026
000027
000028
00000028
0000002C
00000030
00000034
00000038
0000003C
00000040
00000044
00000048
0000004C
00000050
00000054
00000058
0000005C
00000060
00000064
00000068
0000006C
00000070
00000074
00000078
0000007C
00000080
00000084
00000088
0000008C
00000090
00000094
00000098
0000009C
000000A0
00000029
0000002D
00000031
00000035
00000039
0000003D
00000041
00000045
00000049
0000004D
00000051
00000055
00000059
0000005D
00000061
00000065
00000069
0000006D
00000071
00000075
00000079
0000007D
00000081
00000085
00000089
0000008D
00000091
00000095
00000099
0000009D
000000A1
0000002A
0000002E
00000032
00000036
0000003A
0000003E
00000042
00000046
0000004A
0000004E
00000052
00000056
0000005A
0000005E
00000062
00000066
0000006A
0000006E
00000072
00000076
0000007A
0000007E
00000082
00000086
0000008A
0000008E
00000092
00000096
0000009A
0000009E
000000A2
0000002B
0000002F
00000033
00000037
0000003B
0000003F
00000043
00000047
0000004B
0000004F
00000053
00000057
0000005B
0000005F
00000063
00000067
0000006B
0000006F
00000073
00000077
0000007B
0000007F
00000083
00000087
0000008B
0000008F
00000093
00000097
0000009B
0000009F
000000A3
42
000029
000000A4
000000A5
43
00002A
000000A8
Page 80 of 83
000000A6 000000A7
44
00002B
45
46
00002C
00002D
proc_addr
proc_data
000000
000001
000002
000003
000004
000005
000006
000007
000008
000009
00000A
00000B
00000C
00000D
00000E
00000F
000010
000011
000012
000013
000014
000015
000016
000017
000018
000019
00001A
00001B
00001C
00001D
00001E
00001F
000020
000021
000022
000023
000024
000025
000026
000027
000028
00000000
00000001
00000002
00000003
00000004
00000005
00000006
00000007
00000008
00000009
0000000A
0000000B
0000000C
0000000D
0000000E
0000000F
00000010
00000011
00000012
00000013
00000014
00000015
00000016
00000017
00000018
00000019
0000001A
0000001B
0000001C
0000001D
0000001E
0000001F
00000020
00000021
00000022
00000023
00000024
00000025
00000026
00000027
00000028
Page 81 of 83
42
43
44
45
46
000029
00002A
00002B
00002C
00002D
Page 82 of 83
00000029
0000002A
0000002B
0000002C
0000002D
REFERENCES:
[1] Clive Maxfield The design warrior's guide to FPGAs
[2] Will R. Moore, Wayne Luk Field-programmable Logic and Applications
[3] Marian Adamski, Marek Wegrzyn Design of embedded control systems
[4] Sunggu Lee Advanced Digital Logic Design
[5] Pong P Chu RTL hardware design using VHDL
[6] http://wwwfpga4fun.com
[7] http://www.fpgasummit.com
[8] http://www.fpga.com
[9]http://video.google.comlvideoplay?docid=-5776J46032722J35072
[10]http:/www.xilinx.comlsupport/documentation/virtex-4_userguides.htm
[11]http://www.actel.com/documents/modelsim_tutorial_ug.pdf
[12]http://www.xilinx.com/ise/optionalyrod/cspro.htm
[13]http://japan.xilinx.com/products/ipcenter/DO-CSP-PRO.htm
Page 83 of 83