You are on page 1of 30

Design Flows and Tools

Peter A. Beerel
University of Southern California
USC Asynchronous CAD/VLSI Group (async.usc.edu)

Part II - Agenda
Design Flows
Design via decomposition
Modeling design using System Verilog
Design Automation The Proteus-A flow
Legacy RTL
Added System Verilog CSP front-end
Asynchronous optimizations
Final Flow Considerations
Analog Verification
Design for Test and Debug

Design via Process


Decomposition

Collection of Processes linked by Channels


Channels pass messages with guaranteed delivery
Processes synchronize
Processes can be decomposed into smaller processes

Modeling Asynchronous Design


via
SystemVerilogCSP (SVC)
SystemVerilog interface abstracts channel wires as
well as communication protocol
Send/Receive
Blocking tasks (Flow control)
Sender
Abstract
communicati
on
moduleSender(interfaceR);
parameterWIDTH=8;
logic[WIDTH1:0]data;
always
begin
//producedata
R.Send(data);
end
endmodule

SVCInterface

Receiver

moduleReceiver(interfaceL);
parameterWIDTH=8;
logic[WIDTH1:0]data;
always
begin
L.Receive(data);
//consumedata
end
endmodule

SVC - Waveform view

//Sender
(DataGen)
always
begin
#Delay;

R.Send(data);
End

//Receiver
always
begin

L.Receive(data);
#FL;

R.Send(data);
#BL;
end

Receiver
pending on
Receive

Sender
performs
Send,
Communicatio

No one is
Sending or
Receiving

Sender
pending on
Send

Receiver
performs
Receive,
Communicatio
n happens

Part II - Agenda
Design Flows
Design via decomposition
Modeling design using System Verilog
Design Automation The Proteus-A flow
Legacy RTL
Added System Verilog CSP front-end
Asynchronous optimizations
Final Flow Considerations
Analog Verification
Design for Test and Debug

The Proteus-A Flow Legacy


RTL
Key Features
Synth
RTL

Re-uses synchronous EDA tools


Seamless integration into existing
flows

Synthesis

Back-end design style agnostic


Up to 2X higher performance

Tool Status
Commercialized version in
production for 2+ years
Uses proprietary QDI library
Academic version (Proteus-A)
enhanced significantly at USC

Recent Advances

Design
Goals

Image
Netlist Netlist

Proteus/
Sync
Library
Sync
Library

Constraints

ClockGating
Gating
Clock
Netlist
ClockFree

Constraints

Async
NetlistNetlist

Constraints

Clock Tree Synthesis

Physical Design

Final Layout

Flow Demo Legacy RTLPhysical


Design
Synth.RTL
Synthesis
Clockfree

Legacy RTL Specification


Final Layout
Synthesized
Image Netlist
Asynchronous
Gate-level
Netlist

Amber23 Proteus-A Case Study

Download from http://


opencores.com/project,amber
ARM-compatible 32-bit RISC processor
3 stages : FETCH, DECODE and EXECUTE

Cache
Cache
Businterface
Businterface

instruction

Decode
Statemachine

control

Registerbank
Barrelshifter
ALU
Multiplexer

Readdata

Zhang,USCSummerResearch,2012

Address,writedata

Amber23 Performance Comparison

Download from http://


opencores.com/project,amber
ARM-compatible 32-bit RISC processor
3 stages : FETCH, DECODE and EXECUTE

Cache
Cache
Businterface
Businterface

instruction

Decode
Statemachine

control

Registerbank
Barrelshifter
ALU
Multiplexer

Readdata

Address,writedata
Zhang,USCSummerResearch,2012

The Proteus-A Flow


SVCRTL
Key New Features
Supports System Verilog CSP front-end
Enables user-defined conditional
communication
Saves power at architectural level

Tool Status
Proprietary version starting from CAST
Proteus/
Sync
developed at Fulcrum
Library
Sync
System Verilog version subsequently Library
developed at USC
Used in current research at USC and
Technion and 40+ person async class

SystemVerilog
Design
Verilog
Goals
SVC2RTL
Synth. RTL

Constraints

Synthesis
Image
Netlist Netlist

Constraints

ClockGating
Gating
Clock
Netlist
ClockFree

Constraints

Async
NetlistNetlist

Constraints

Clock Tree Synthesis

Physical Design

Final Layout

Key to Low-Power Conditional


Communication

op

DEMUX

A,B

0 S
0

R0
Mult

MUX

Add/Sub

Conditional communication reduces token flow, saving


power
Traditionally - manually introduced via user-created
decomposition
Recent research - automatically introduced via Operand
Isolation

Saifhashemi, PATMOS 2012

SVC2RTL Enables User-Defined Conditional Communication

Dummy value

Not received
Not sent

Part II - Agenda
Design Flows
Design via decomposition
Modeling design using System Verilog
Design Automation The Proteus-A flow
Legacy RTL
Added System Verilog CSP front-end
Asynchronous optimizations
Final Flow Considerations
Analog Verification
Design for Test and Debug

Power Optimization Overview


Conditioning
Automatically add conditional
communication

Reconditioning
Optimize the existing conditionality

Power Saving - The Opportunity

Unnecessary
calculation

Our Solution - Adding Isolation


Cells
All inputs/outputs are
unconditional
Operand Isolation
And-based isolation
cells
Generated by
synchronous RTL
synthesizer
Does not prevent
switching in
asynchronous circuits

Isolation cells are not effective in asynchronous


circuits

Our Solution - Conditioning

&

+
+
0

0
No Activity

Power Optimization Results


Case study: 32-bit ALU placed and routed
Back annotated switching activity using a VCD file

Results:
Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2
53% power reduction when only isolating MUL (r f=0.25)
Area cost of isolating MUL is about 4% and no performance
penalty

Saifhashemi,Patmos2012

Power Savings The


Opportunity
Unnecessary
activity

0
1

Unnecessary
activity

Conditional communication is explicit and only at


primary IO

The Reconditioning Problem

Definition (The Reconditioning Problem): Rearrange


location of RECEIVE and SEND cells to minimize Power
consumption while preserving functional behavior.

Power Results
Power Comparison: 32 bit
8000
7000
6000
5000
4000
Power
3000
2000
1000
0

Power Comparison: 32 bit


5000
4000
Original
Greedy0
MILP

Original

3000

Greedy0

Power 2000

MILP

1000
0
0.25

0.5

0.75

0.25

Operational factor

RECON1:
Dual-mode arithmetic
unit

0.5

0.75

Operational factor

RECON2:
Conditional multiplier

Power Comparison: 32 bit


5000
4000

Original

3000

Greedy0

Power 2000

MILP

1000
0
0.25

Saifhashemi,PhDThesis,2012

0.5

Operational factor

0.75

ALU-OI
ALU after operand
isolation

Mode Based Conditional Slack


Matching

op

DEMUX

A,B

MUX

Add/Sub

Mult

Conditional Slack Matching Advantage Conditional behavior

yields less stalls and thus not as many pipeline buffers


needed
Previously ignored conservatively modeled as
unconditional

Najibii,2012

Conditional Slack Matching - Results

33%lessbuffers
onaverage
Najibii,2012

Design Flow Demo


SystemVerilog

Design
Goals

SVC2RTL
Synth. RTL

Constraints

Synthesis
Image Netlist

Proteus/
Sync
Library

Constraints

ClockFree
Async Netlist

Constraints

Physical Design

Final Layout

Agenda
Design Flows
Design via decomposition
Modeling design using System Verilog
Design Automation The Proteus-A flow
Legacy RTL
Added System Verilog CSP front-end
Asynchronous optimizations
Final Flow Considerations
Analog Verification
Design for Test and Debug

Final Flow Considerations


Static Timing Analysis
Verify timing constraints and performance is a must
Trick traditional tools into working with asynchronous circuits

Analog Verification
Domino logic used in QDI flows sensitive to charge sharing
Asynchronous channels cannot tolerate cross-talk glitches
Special spiced-based tools developed

Asynchronous Scan
Asynchronous scan is a must but doable

Design for Silicon Debug


Chip deadlock is still difficult to debug

Conclusions
The Asynchronous Design Flow/CAD Landscape
Synchronous design rigidity continues to hamper quality design
Asynchronous design offers solutions but has many design flow
challenges

Design Flow Requirements


Design flows must easily integrate into synchronous designs
Circuit quality must compete very well to warrant switching design
styles

Our approach

Proteus provides a good design framework for automation


of both legacy RTL and SystemVerilog CSP

Final considerations of analog and timing verification, scan,


and debug should not be over looked

Acknowledgements

http://ee.usc.edu/async2013

You might also like