You are on page 1of 15

Combining Simulators and FPGAs

“An Out-of-Body Experience”


Eric S. Chung, Brian Gold,
James C. Hoe, Babak Falsafi
{echung, bgold, jhoe, babak}@ece.cmu.edu

SIMFLEX/PROTOFLEX
Computer Architecture Lab at
The RAMP full-system challenge
• RAMP vision for studying systems w/ FPGAs
– functional & cycle-accurate simulation 
– scalability, speed, & flexibility on FPGAs 
– full-system (run unmodified binaries & OS)

I/O MMU DMA IRQ
CPU CPU controller controller controller
Terminal

PCI Bus

Memory
Graphics Ethernet SCSI
card controller controller
Disk Disk

‘Full-sys’ RAMP will incur large effort


yet, not all behaviors frequently used (e.g., I/O)
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat
2
Combining simulators & FPGAs
• Simulators already provide full-system
 why not simulate infrequent behaviors (e.g., I/O devices)?

FPGA Simulator
CPU CPU CPU CPU

Memory SCSI Ethernet Memory SCSI Ethernet

disk disk

• Advantages
– avoid impl. infreq. behaviors  lowers full-sys FPGA development
– low impact on scalability & perf. on FPGA

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


3
Outline

• Motivation
• Migration
• Implementation status
• Conclusion

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


4
Migration
Target design FPGA Simulator
“Target
objects”
3
ex: func or
timing cpu

1 2

• 3 ways to map target object to host


FPGA-only 1 Simulation-only 2 Migratable 3

• Migratable objects
– switch modes between FPGA & simulator hosts
– target behavior need not be 100% in FPGA mode
e.g., impl. 80% target behavior in FPGA, 100% in simulator
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat
5
Migration example
Target-to-host mappings:
• CPU = migratable CPU
CPU
FPGA load
• Memory = FPGA-only
• Devices = SW-only SCSI
Memory

Example CPU CPU state transfer


instruction stream
Simulator SCSI
load CPU
add cmd
time

multiply
I/O SCSI cmd
Memory SCSI
add
sub
.. disk

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


6
Advantages
• Lowers development effort
– avoid bring-up of infrequent behaviors
– migrate & validate ref. models from simulator
– tailor impl. to workload (avoid rarely used instrs, good for CISC x86)
• Fast & scalable
– perf-critical objects on FPGA (eg, CPU, memory)
– scalable for MPs  add migratable CPUs

FPGA CPU CPU CPU Simulator CPU CPU CPU

Memory SCSI Memory SCSI

disk

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


7
Subtleties
• Objects separated in simulator/FPGA interact
– examples: interrupts, DMA
– handle by forwarding messages between FPGA/simulator
– FPGA-only & SW-only mapped objects easy to locate
– migrated objects require tracking

FPGA Simulator
CPU CPU DMA

Memory SCSI Memory SCSI

disk
Forwarded DMA

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


8
Subtleties
• Objects separated in simulator/FPGA interact
– examples: interrupts, DMA
– handle by forwarding messages between FPGA/simulator
– FPGA-only & SW-only mapped objects easy to locate
– migrated objects require tracking
Option 2:
Option 1: Forced migration
FPGA Forwarded Simulator
CPU interrupt CPU
Interrupt

Memory SCSI Memory SCSI

disk

Cross-host interactions rare


 low impact on FPGA perf.
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat
9
Subtleties cont.
• Migration cost
– migrating object requires state copy
e.g., migratable CPU has registers & TLBs
– FPGA-to-simulator latency & sim. time limits # migrations/instr

• FPGA & simulator asynchrony


– simulated time “ticks” at different rates in FPGA & simulator
– must synchronize for deterministic replay & accurate device timing

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


10
Outline

• Motivation
• Migration
• Implementation in progress
• Conclusion

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


11
Implementation status
• Target system
– Sun Fire[tm] 3800 Server (up to 24-way)
– UltraSPARC III ISA
– Solaris 8

• Proof-of-concept software-to-software migration


– run 2 instances of Virtutech Simics
– migration designed & tested in 2 weeks
– can migrate on arbitrary behavior (e.g., ADD instruction)

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


12
BlueSPARC core (in progress)
• In-order SPARCV9 core
– supports 144 out of 170 integer instr behaviors
– supports partial MMU w/ I- & D-TLBs
– goal: 99.999% of instrs & behaviors in target workloads
• SPEC (mostly user-level), OLTP/DB2 (high TLB misses, 40% time in priv-mode)

– CPI ranges 5 to 7 cycles


– synth: 15k LUTs on Virtex-II Pro 30, 85MHz, 12MIPS (worst-case)
– developed in Bluespec HDL, 6000L in 6 weeks

• Core validation
– run RTL in lockstep w/ Simics’s UltraSPARC simulation model
– workload validation w/ SPEC, OLTP/DB2, OpenSPARC verif. suite

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


13
Migration on FPGA (in progress)
Xilinx XUP Virtex-II Pro 30 Virtutech Simics

BlueSPARC Simics
PowerPC UltraSPARC
Migration
& message
interface Simulated target
DDR
memory ethernet devices

• PowerPC functions
– core & memory initialization from Simics checkpoints
– facilitates migration for BlueSPARC
– connects simulated devices to memory (e.g., SCSI DMA)

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


14
Conclusion
• Contributions
– virtualizes infrequent behaviors using simulation
– simplifies full-system FPGA emulator, still fast/scalable
– incremental validation from reference system
• Future work
– support migration in RDL?
– adding cores + scaling across multiple FPGAs
• We are ready for BEE2
• Thanks! Questions? echung@ece.cmu.edu
• PROTOFLEX/SIMFLEX (http://www.ece.cmu.edu/~simflex)

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat


15

You might also like