You are on page 1of 71

Part III

Logic
Emulation
What is a Logic
Emulation System?
1. A programmable hardware built with
programmable logic (FPGA) and
programmable interconnect devices (PID).

2. A software which automatically programs the
hardware according to the circuit under design

3. Control HW/SW to support operation of the
emulated design as a hardware component
operating in real time.
Target System
Typical Logic Emulation
Environment
Workstation
Logic Emulator
Logic Module
Probe Module
In-circuit
Interface
Compiler, runtime software
Stimulus generator, logic analyzer
Why we need Logic
Emulation?
Design verification issues.

Real-time operation.

System-level testing.

Rapid prototyping.
Design Verification
Issues
Simulation-based verification methods have
run out of steam when chip complexity
grows.

Emulation is a verification technology that
grows along with design size.
Real-Time Operation
Simulation requires test vector development
which is costly and difficult.
Verification depends on test vector correctness.

Certain applications must be verified in real time -
human perception: audio and video.

Emulation connected to actual hardware can run:
real diagnostic code,
operating systems, and
applications.
System-Level Testing
Often the chip meets its specifications but it fails
in the system.

We have to verify the system-level interactions
between the chip and other components. They
are hard to formalize.

Internal probing is impossible when the chip is
fabbed and placed in a system
But it is possible using emulation.
Rapid Prototyping
Once emulated design is debugged it is
available for immediate use by software
developers for software debugging.

Emulated design is available for demo and
experiments with architecture on real
applications and data.

Programmable Hardware includes
programmable interconnect
Programmable
interconnect
Memory
element
VLSI
core
Interface
Logic
element
Logic
element
Considerations for
programmable interconnect
The capacity of logic and interconnection depends on
package constraints.

This forces a hierarchical system.
Chips => boards => boxes => system

The interconnect structure must:
1. Provide successful connectivity,
2. Maximize FPGA utilization, and
3. Minimize delay and skew.

Rents rule applies to predict the interconnect needs.
Structures of Multi-FPGA
Systems
Topologies:
- Mesh - nearest neighboring.
- Crossbar - full and partial.

Interconnect scheme:
- Circuit switched.
- Time multiplexed.
Nearest Neighbor
Interconnection
FPGA FPGA FPGA
FPGA FPGA FPGA
FPGA FPGA FPGA
Advantages and Disadvantages of
Nearest Neighbor Interconnection
Advantages:
Uniform: all chips the same.
Easy to lay out on PCB.

Disadvantages:
Routing is easily blocked.
The through pins limit the logic utilization of FPGAs.
Long and unpredictable delays.
No natural hierarchical extension.
Nearest Neighbor Extensions
FPGA FPGA FPGA
FPGA FPGA FPGA
FPGA FPGA FPGA
Add more
neighbors
Connect to
non-neighbors
Advantages and Disadvantages of
nearest-neighbor extended architectures
Advantages:
More choices for router by adding diagonal
lines & skip lines.

Disadvantages:
More complex PCB.
More complex routing software.
Partial Crossbar Interconnect
A B C D A B C D A B C D A B C D
A pins
B pins C pins D pins
Logic blocks
Crossbars
Second-level crossbars
Partial Crossbar Interconnect
Partial crossbar consists of a set of small full
crossbars,
connected to logic blocks
but not to each other.

I/O pins of each FPGA are divided into subsets.
Each subset is connected by a full crossbar circuit
switch.

Partial crossbar is a potentially blocking network.
Characteristics of Partial
Crossbar Architecture
Partial crossbars size is proportional to the
number of FPGA pins.

All interconnections go through one/three
crossbar chips for a one-level/two-level
partial crossbar interconnect
delays are uniform and bounded.
Mixed Full and Partial
Crossbar
FPGA
Local
FPIC
Global
FPIC
Global
FPIC
Local
FPIC
Local
FPIC
FPGA FPGA FPGA FPGA FPGA
External
connections
Partial
crossbar
Full
crossbar
Circuit Switched versus Time
Multiplexed Interconnect Schemes
Trade-offs between the operating speed and the
hardware cost.
Time-multiplexing method:
can greatly expand available interconnect.
allows lower cost IC package and PCB.
makes partitioning easier.

BUT
System power increases due to frequent signal
switching (higher hardware cost).
Complex scheduling software.
Slow operating speed.
Virtual Wires
FPGA
FPGA
Physical
wires
Logical
outputs
Logical
inputs
FPGA FPGA
M
u
x

D
e
M
u
x

I change space to time
Logic Emulation Systems and their
interconnection schemes
System with mesh topology - Quickturns RPM and
Virtual Machine Works (IKOS).

System with partial crossbar - Quickturns Enterprise,
Mars, and System Realizer.

System with mixed full and partial crossbar - Aptix
Prototyping System.

System using time-multiplexed interconnect - Virtual
Machine Works (IKOS) , CoBALT and Arkos (Quickturn).
Memory Solutions in Emulators and
future devices/systems
Goal: programmable memories with
different width/depth/port combinations.

FPGA-based memories:
inefficient of using logic resources.
timing correctness is difficult to be insured.
large or highly multi-ported memories must be
partitioned across several FPGAs.

SRAMs with dedicated or programmable
controllers.
Logic Emulation Design Flow
Pre-configuration
preparation
Full-chip
configuration
In-circuit
emulation
HDL synthesis
Synthesis
Partitioning
System mapping
P & R
Design downloading
Emulators
Logic Emulation Design
Compiler and its components
Logic emulation design compiler is a large and complex
EDA tool which includes:

Front-end design importer.

HDL-based synthesizer.

Clock and timing analyzer.

Partitioner.

System-level placer and router.

FPGA-based placer and router.
Objectives of logic emulation
compiler
Fast compilation time.

Fast emulation clock.

Timing correctness.

Easy (ECO ENGINEERING Change Order).

Minimize circuit size.
Design Considerations for Logic
Emulators
HDL synthesis:
Trade-off run-time and quality.
CLB-based vs. gate-based designs.

Clock and timing analysis:
Timing correctness, hold-time violation free.
Clock skew minimization.

Partitioning:
Run time. -
Timing and area.
System placement and routing:
Timing.
Completeness of routing.

FPGA-based placement and routing:
Fast run time.
Parallel compilation.
Design Considerations for Logic
Emulators
Remember you
emulate not the same
logic as your design
Hold-Time Violation
Hold-time violation occurs
when Routing delay > LUT delay!!!
D
Q
CK
D
Q
CK
LUT
CLB
Routing delay
Clock distribution problem (Skew)!!!
Timing Correctness
D
Q
CK
D
Q
CK
LUT
CLB
Routing delay
Delay
element
Delay insertion
Timing Correctness
D
Q
CK
D
Q
CK
LUT
CLB
Clock path
CE
Primary clock Low-skew net
Use clock enables for gated clocks
Methodology and components of Logic
Emulator System
Pre-configuration preparation - prepare netlists
and control files for configuration.

Testbed preparation - prepare emulation-based
operation environment.

Full-chip configuration - download design to the
emulator.

In-circuit emulation - test the design.
Pre-Configuration in Emulator
System
Translate the leaf-cell libraries into emulation
primitives.

Translated libraries must be verified for functional
equivalence to original.

Modify and redesign some components to attain
compatibility with emulation techniques, such as
precharge logic circuits.

Assemble all the gate-level netlists for the entire
design.
Testbed in Logic Emulator
Design and implement the target ICE board
combining the emulated design with real
hardware.

Slowdown testbed to emulation speed.

Assemble the testbed and emulation
equipment.
Full-Chip Configuration & In-
Circuit Emulation
Full-chip configuration:
Prepare control files.
Partition the design to fit into the emulation
system.
Download design into the system.
Verify that the emulation model faithfully
implements the design as specified by RTL.
In-circuit emulation
Part IV
Reconfigurable
Computing and
Systems
General-Purpose Computing
vs. Custom Computing
General-purpose computing - applying
applications on a general-purpose computer.

Custom computing - applying applications
on a custom-made application-specific
hardware.

Field-programmable devices make this into a
reality.
Goals of Reconfigurable
Computing
Tailor the architecture to the application.

Minimize or eliminate instruction interpretation.

Exploit fine grained parallelism.

Map software to hardware.
Applications of reconfigurable
computing
Database search and analysis.
Image processing and machine vision.
Data compression.
Signal processing.
Neural networks.
Biology computing.
Medical computing.
Design Automation (PSU)
Many more.
ROM
Application 1
Multi-Mode Systems map
various applications to a reconfigurable
system
Reconfigurable
system
Different configurations for read & write
operations of a tape driver (Honeywell).

Different configurations for different
printer controllers (Tektronix).
Application 2
Run-Time Reconfiguration in
military image recognition system
Jeep?
Tank?
I/O
Truck?
Image data
?
Break single computation into multiple pieces.

Page in components as needed (virtual hardware),
ex., automatic target recognition.
Custom Computing
Application-specific systems.

Numerous applications for similar reconfigurable
systems.

Offers hardware performance, flexibility to handle
numerous algorithms.

Multi-FPGA systems can be viewed as hardware
supercomputers.
Tell about DEC Perle
Reconfigurable Co-processors
Processor
Coprocessor
Program 1
Inst1
Program 2
Inst2
- Provide custom instructions
on a per-application basis.
Types of Reprogrammable
Systems
Coprocessor
CPU
Attached
processing
unit
Memory
caches
I/O
interface
Standalone
PU
PU = processing Unit
Three ways to attach
custom computing units
Types of Reprogrammable
Systems
Attached and standalone processing units are
reprogrammable systems on computer add-on
cards and separate reprogrammable cabinets.

Considerations: large communication overhead may
over-shadow the speed gain.

Application-specific coprocessors can achieve
significant improvement over a wide range of
applications.

Types of Reprogrammable
Systems
Integrate the reprogrammable logic into
the processor itself.

A reprogrammable functional unit can be
configured on a per-algorithm basis.

Providing some special-purpose instructions
tailored to the needs of a given application.
Architectures of Multi-FPGA
(Reconfigurable) Systems
The most commonly used topologies:
Mesh: 1D (linear array), 2D, and 3D.

Crossbar: full, partial, mixed, and
hierarchical.

Hybrid between mesh and crossbar.

Application-specific architecture.
Hybrid Topology of a reconfigurable
system
Splash 2: augments a linear array of FPGAs with
a crossbar switch.
Goal: Supporting systolic circuits.
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
FPGA
16 FPGAs
Ext. Interface
Ext. Interface
Hybrid Topology
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
Host
interface
Anyboard: A linear array of FPGAs augmented
by global buses.
Hybrid Topology
4 X 4 mesh
of FPGAs
RAM
RAM
RAM
RAM
Host
interface
DECPeRLe-1: a 4 X 4 mesh of FPGAs augmented
with shred global buses.
Application-Specific Topology of
MARC-1, one subsystem
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPU
Memory
1
1
1
1
4 5 2 3
4
5
2
3
4 5 2
3
The Marc-1: subsystem 1.
Connections to other
FPGAs
Application-Specific
Topology of Marc-1, cont.
1
5
4
3
2
Subsystem1
Subsystem1
The Marc-1
Application in circuit
simulation where the
program to be executed
can be optimized on a
per-run basis.

This is done for
values constant
within that run,
but which may vary
from dataset to
dataset.
Application-Specific Topology
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
The RM-nc system: neural network.
Architecture for Computer
Prototyping
FPGA
FPGA
FPGA
FPGA FPGA
FPGA FPGA Cache memory
Register file
ALU FPU
VME bus
The Mushroom processor
prototyping system.
Expandable Topologies
Hierarchical crossbar topology: can be
expanded by adding extra level.
- Quickturn systems.

Expandable mesh topology: can be
expanded by connecting individual boards
to form a large mesh.
The Virtual Wires Emulation System (IKOS).
Topology for Adapting Other
Components
Many multi-FPGA systems include non-
FPGA resources to provide more general
purpose solutions.

The MORRPH system - sockets next to
FPGAs which allow to add arbitrary devices
to the array.

The G800 board - contains two FPGAs and
four sockets.
Topology for Adapting Other
Components
The COBRA system
Contains:
based modules (expanding to 2D mesh),
RAM modules,
I/O modules,
and bus modules.

The Springbok system
a pre-made daughter board which is able to
contain an arbitrary device (on the top) and an
FPGA (on the bottom).
Daughter boards are mounted on a baseplate.
Topology for Adapting Other
Components
The Quickturn systems - external
component adapters.

The Aptix FPCB - a reprogrammable PCB.
Design Methodology for
general-purpose configurable
systems
Applications
Host
computer
Reprogrammable
system
Mapping
Typical Software Methodology for
general-purpose configurable systems
Application
spec.
Analysis
System-level
synthesis
Software
spec.
Code
generation
Object code
Hardware
synthesis
Hardware
spec.
Typical Software Methodology for
general-purpose configurable systems
Hardware spec.
Synthesis
Partitioning & placement
Pin assignment & routing
FPGA P & R
Bit-stream files
Considerations for such
complex software systems
Architectural-specific design tasks.

Design automation process.

The mapping time dominates the setup
time for operating the system.

Run-time reconfigurability.
Design Specification and Languages for
reconfigurable software systems
Standard software programming languages,
e.g., C, C++, FORTRAN, and assembly language, vs.
HDLs.

Standard software programming languages - a
sequential execution model.

HDLs - a parallel execution model.

Who will use it and which one is more suitable for
system description???
Compilation Issues
Translate code from software languages
into hardware without losing the inherent
concurrency of hardware.

Compiler techniques for parallelizing code.

Straight-line code, control flow, and loops.

Transmogrifier C compiler.
System-level and High-
level Synthesis
System-level design evaluation and analysis.

Design estimation.

Hardware-software partitioning.

Interface synthesis.

RTL synthesis.

Logic synthesis and technology mapping.
Partitioning and
Placement
Topology-aware partitioning methods.

Partitioning onto a multi-FPGA system is
equivalent to a placement problem.

Logic utilization and timing.
Pin Assignment and
Routing
Pin-assignment - the process of determining
which I/O pins to be used for each inter-FPGA
signal.

Pin-assignment for a pre-fabricated multi-FPGA
system is equivalent to the global routing
problem.

Pin-assignment will greatly affect the quality of
FPGAs logic utilization and routability.
Run-Time Reconfigurability
Virtual hardware <=> virtual memory. What are their
relations? Artificial Intelligence, robotics. Vision.

Hardware on demand.

What is the Initial Un-configured structure?
What are the reconfiguring methods.

Software supporting time-varying mapping.

Many open problems need to be solved in the forth
coming years.
This is a new issue in system design: how much of the processor is
virtual, when to reconfigure?
Applications: Splash 2
Stream oriented systolic and SIMD applications.

Scalable linear array of 16 to 256 processing
elements (1 XC4010 with 1/2 Mbyte).

VHDL based.

Sequence comparison - 2300M:0.75M cell
updates/sec (Splash 2:Sparc 10).

Edge detection - 10M:242K pixels/sec (Splash
2:Sparc 10).
Applications: PAM (DEC)
Programmable Active Memory (PAM).

C++ based and mesh arrays of XC3090
(DECPeRLe-1).

Applications:
Multiple precision arithmetic.
RSA encryption.
Video compression (JPEG, MPEG, DCT). -
High energy physics.
Telecommunications.

Sources of some slides
Peter Alfke
Xilinx, Inc
peter.alfke@xilinx.com

You might also like