Professional Documents
Culture Documents
Jan M. Rabaey
Chapter 6
Chapter Outline
Clock distribution
6.2
ITRS Projections
Calendar Year Interconnect One Half Pitch MOSFET Physical Gate Length Number of Interconnect Levels On-Chip Local Clock Chip-to-Board Clock # of Hi Perf. ASIC Signal I/O Pads # of Hi Perf. ASIC Power/Ground Pads Supply Voltage Supply Current
6.3
6.4
Interconnect
9%
Clocks
65%
mProcessor
I/O
Clock
FPGA
Memory
Logic
Signal processor
Low Power Design Essentials 2008 6.5
Parameter W, H, t
Relation
L
C R tp ~ CR E LW/t L/WH L2/Ht CV2
1/S
1/S S 1 1/SU2
1
1 S2 S2 1/U2
1/SC
1/SC S2/SC S2/SC2 1/(SCU2)
6.6
IEEE 1998
6.7
Technology Innovations
Reduce resistivity (e.g. Copper) Reduce dielectric permittivity (e.g. Aerogels or air)
IEEE 1998
Logic Scaling
10
0
Power [W], P
10 10 10 10 10
-3
-6
-9
10-18J
-12
Ptp ~ 1/S3
-15
10
-12
10
-9
10
-6
10
-3
10
Delay [s], tp
Low Power Design Essentials 2008 [Ref: J. Davis, Proc01] 6.9
Interconnect Scaling
10
10
10 10
L-2 t= 10-5 [s/cm-2]
-5
10
-4
10 10 10 10
4 2
10
-7
(1)
(F = 0.1)
10-9
10-11
10 10 10
-2 -1
(10)
10-13
0
(1000)
(100)
-0
-2
L t~S
-18
-2
2
-12
10 10
2
10
-4
10
10
-15
10
10
-9
10
-6
10
-3
Delay [s], t
Low Power Design Essentials 2008 [Ref: J. Davis, Proc01] 6.10
(Length)[cm], L
10
10
-3
Claude Shannon
Helps to organize along abstraction layers, well understood in the networking world: the OSI protocol stack
6.12
Presentation/Application
Session
Transport Network Data Link Physical
No requirement to implement all layers of the stack Layered structure must not necessarily be maintained in final implementation
Low Power Design Essentials 2008 [Ref: M. Sgroi, DAC01] 6.13
Presentation/Application
Session
Transport Network Data Link Physical
Signal waveform
Discrete levels, pulses, modulated sinusoids
Voltages
Reduced swing
Timing, synchronization
So far, on-chip communication almost uniquely level-based
Low Power Design Essentials 2008 6.14
Repeater Insertion
t p L ( R d Cd )(rwcw )
with RdCd and rwcw intrinsic delays of inverter and wire, respectively But: At major energy cost!
6.15
6.16
Repeater overhead
0.8 0.7
eNorm
0.6 0.5
0.4
dNorm
Low Power Design Essentials 2008 6.17
Multi-dimensional Optimization
1.2 1.1
Number of stages
Design parameters: Voltage, number of stages, buffer sizes Voltage scaling has largest impact, followed by selection of number of repeaters Transistor sizing secondary.
VDD (V)
4
2 0 1 2 3 4 5 6 7 8
dNorm
6.18
Reduced Swing
Transmitter (TX)
Receiver (RX)
in
OUT VDDL CL
OUT
Requires two discrete voltage levels Asynchronous level conversion adds extra delay
Low Power Design Essentials 2008 [Ref: H. Zhang, TVLSI00] 6.20
P1 in2 in CL N1 N3 P3
P2 out
N2
VTC Transient
6.21
CL
clk
clk
out
Allows for very low swings (200 mV) Robust Quadratic energy savings But: doubling the wiring, extra clock signal, complexity
[Ref: T. Burd, UCB01] 6.22
Swings as low as 200 mV have been reported [Ref: Burd00], 100 mV definitely possible Further reduction requires crosstalk suppression
shielding
GND GND GND
folding
6.23
Quasi-Adiabatic Charging
Uses stepwise approximation of adiabatic (dis)charging Capacitors acting as charge reservoir Energy drawn from supply reduced by factor N
VDD
VDD/ N t
6.24
VDD E
B1
B1
P
E P E
1 0
B1 B0 B0
RX1 RX0
3VDD/4 VDD/2
B1 = 1 B1 B0
VDD/4
B0
B0 = 0
GND
Precharge
Eval
Precharge
Charge recycled from top to bottom Precharge phase equalizes differential lines Energy/bit = 2C(VDD/N)2 Challenges: Receiver design, noise margins
6.25
Offers some compelling advantages Reduced swing Swing is VDD/(n+1) without extra supply Reduced load Allows for smaller driver Reduced delay Capacitor pre-emphasizes edges
Low Power Design Essentials 2008
Signaling Protocols
Network
din reqin ackin dout reqout ackout
Globally Asynchronous
self-timed handshaking Din protocol
REQ in
6.27
Signaling Protocols
Network
din reqin ackin dout
Globally Asynchronous
reqout ackout
Din
REQ in
done
Clk
done
Locally synchronous
6.28
Coding
N+k
Decoder
Encoder
TX
Link
RX
Adding redundancy to communication link (extra bits) to: Reduce transitions (activity encoding) Reduce energy/bit (error-correcting coding)
6.30
Decoder
Encoder
N D
Data word D inverted if Hamming distance from previous is larger than N/2.
D 00101010 00111011 11010100 00001101 01110110
Low Power Design Essentials 2008
#T 2 7 5 6
p 0 0 1 0 1
6.31
Bus-Invert Coding
D
P
Reg
Denc
p
Decode Bus
Encode
Gain: 25 % (at best for random data) Overhead: Extra wire (and activity) Encoder, decoder Not effective for correlated data
6.32
bit k-1 h h h
bit k h h h h h
bit k+1 h i i
1 + 4r
Error-Correcting Codes
N D Decoder Encoder N+k Denc N D
e.g.
B3 wrong
P1P2B3P4B5B6B7
with
P1 + B3 + B5 + B7 = 0
1 1 =3
P2 + B3 + B6 + B7 = 0
P4 + B5 + B5 + B7 = 0
Adding redundancy allows for more aggressive scaling of signal swings and/or timing Simpler codes such as Hamming prove most effective
6.34
Media Access
Sharing of physical media over multiple data streams increases capacitance and activity (see Chapter 5), but reduces area Many multi-access schemes known from communications
Time domain:Time-Division Multiple Access (TDMA) Frequency domain: narrow band, code division multiplexing
6.35
Arbitration
Command
Current Slot
Presentation/Application
Session
Transport Network Data Link Physical
Becoming more important in todays complex multi-processor designs The Network-on-a-Chip (NOC)
Low Power Design Essentials 2008 [Ref: G. De Micheli, Morgan-Kaufman06] 6.37
Network-on-a-Chip (NoC)
or
Dedicated networks with reserved links preferable for high traffic channels but: limited connectivity, area overhead Flexibility an increasing requirement in multi (many) core chip implementations
Low Power Design Essentials 2008 6.38
Router
Dedicated wiring
Low Power Design Essentials 2008
Network-on-a-Chip
[Courtesy: B. Dally, Stanford] 6.39
Networking Topology
Homogeneous
Crossbar, Butterfly, Torus, Mesh,Tree,
Heterogeneous
Hierarchy
Crossbar
Tree
Mesh (FPGA)
6.40
Energy x Delay
Mesh
Binary Tree
6.41
C Bus C
Circuit-switched approach attractive for high-data rate quasi-static links Hierarchical combination often preferred choice
Hierarchical circuit and packet switched networks for longer connections Low Power Design Essentials 2008
C
BusR C
C
Bus R
C
BusR C
C
Bus R
C
6.42
Configuration Bus
Arithmetic Module Arithmetic Module Arithmetic Module
Configurable Interconnect
mP
Network Interface
Dedicated Arithmetic
Configuration
6.43
Cluster
Cluster
Universal Switchbox
Hierarchical Switchbox
Structured approach reduces interconnect energy with factor 7 over straightforward cross-bar
Low Power Design Essentials 2008 6.44
Presentation/Application
Session Transport
Network Data Link Physical
Example: Establish, maintain and rip-up connections in dynamically reconfigurable Systems-on-a-Chip Important in power-management
Low Power Design Essentials 2008 6.45
Opportunities
Reduced swing Alternative clock distribution schemes Avoiding a global clock altogether
6.46
VDD
GND
VDD
GND
IEEE 2006
6.48
Summary
Interconnect important component of overall power dissipation Structured approach with exploration at different abstraction layers most effective Lot to be learned from communications and networking community yet, techniques must be applied judiciously
Cost relationship between active and passive components different
Some exciting possibilities for the future: 3Dintegration, novel interconnect materials, optical or wireless I/O
Low Power Design Essentials 2008 6.49
References
Books and Book Chapters
T. Burd, Energy-Efficient Processor System Design, http://bwrc.eecs.berkeley.edu/Publications/2001/THESES/energ_eff_process-sys_des/index.htm, UCB, 2001. G. De Micheli and L. Benini, Networks on Chips: Technology and Tools, Morgan-Kaufman, 2006. V. George and J. Rabaey, Low-energy FPGAs: Architecture and Design, Springer 2001. J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed, Prentice Hall 2003. C. Svensson, Low-Power and Low-Voltage Communication for SoCs, in C. Piguet, Low-Power Electronics Design, Ch. 14, CRC Press, 2005. L. Svensson, Adiabatic and Clock-Powered Circuits, in C. Piguet, Low-Power Electronics Design, Ch. 15, CRC Press, 2005. G. Yeap, Special Techniques, in Practical Low Power Digital VLSI Design, Ch 6., Kluwer Academic Publishers, 1998.
Articles
L. Benini et al, Address bus encoding techniques for system-level power optimization, Proceedings DATE98, pp. 861-867, Paris, February 1998 T. Burd et al., A Dynamic Voltage Scaled Microprocessor System, IEEE ISSCC Digest of Technical Papers, pp. 294-295, Feb. 2000. M. Chang et al, CMP Network-on-Chop Overlaid with Multi-Band RF Interconnect, International Symposium on High-Performance Computer Architecture, Febr. 2008. D.M. Chapiro, Globally Asynchronous Locally Synchronous Systems, PhD thesis, Stanford Low Power University, 1984. Design Essentials 2008 6.50
References (cntd)
W. Dally, Route Packets, Not Wires: On-Chip Interconnect Networks, Proceedings DAC 2001, pp. 684-689, Las Vegas, June 2001. J. Davis and J. Meindl, Is Interconnect the Weak Link?, IEEE Circuits and Systems Magazine, pp. 30-36, March 1998. J. Davis et al., Interconnect Limits on Gigascale Integration (GSI) in the 21st Century, Proceedings of the IEEE, Vol. 89, No. 3, pp. 305-324, March 2001. D. Hopkins et al, "Circuit techniques to enable 430Gb/s/mm2 proximity communication," IEEE International Solid-State Circuits Conference, vol. XL, pp. 368 - 369, February 2007. H. Kojima et al., Half-Swing Clocking Scheme for 75% Power Saving in Clocking Circuitry, Journal of Solid Stated Circuits, vol. 30, no 4, pp. 432-435, April 1995. E. Kusse and J. Rabaey, Low-energy embedded FPGA structures, Proceedings ISLPED98, pp.155-160, Monterey, Aug. 1998. V. Prodanov and M. Banu, GHz Serial Passive Clock Distribution in VLSI using Bidirectional Signaling, Proceedings CICC 06. S. Ramprasad et al., A coding framework for low-power address and data busses, IEEE Transactions on VLSI Signal Processing, Vol. 7, No 2, pp. 212-221, June 1999. M. Sgroi et al, Addressing the System-on-a-Chip Woes Through Communication-Based Design, Proceedings DAC 2001, pp. 678-683, Las Vegas, June 2001. P. Sotiriadis and A. Chandrakasan, Reducing Bus Delay in Submicron Technology Using Coding, Proceedings ASPDAC Conference, Yokohama, January 2001.
6.51
References (cntd)
M. Stan and W. Burleson, Bus-Invert Coding for Low-Power I/O, IEEE Transactions on VLSI, pp. 48-58, March 1995. M.. Stan, W. Burleson, "Low-Power Encodings for Global Communication in CMOS VLSI", IEEE Transactions on VLSI Systems, pp. 444-455, Dec. 1997. V. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou, Energy-Efficient GHz-Class Charg-Recovery logic, IEEE JSSC vol. 42 No 1, pp.38-47, January 2007. L. Svensson et al., A sub-CV2 pad Driver with 10 ns Transition Time, Proc. ISLPED 96, Monterey, Aug. 12-14, 1996. D. Wingard, Micronetwork-Based Integration for SOCs, Proceedings DAC 01, pp. pp. 673-677, Las Vegas, June 2001. H. Yamauchi et al., An Asymptotically Zero Power Charge Recycling Bus, IEEE Journal of Solid Stated Circuits, vol. 30, no 4, pp. 423-431, April 1995. H. Zhang, V. George and J. Rabaey, Low-Swing on-chip Signaling Techniques: Effectiveness and Robustness, IEEE Transactions on VLSI Systems, Vol. 8, No 3, pp. 264-272, June 2000. H. Zhang et al, A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications, IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1697-1704, Nov. 2000.
6.52