You are on page 1of 6

1

Use of Nano-Mechanical Relays for FPGA Power Reduction


Rikky Muller (rikky@eecs.berkeley.edu) and Chintan Thakkar (cthakkar@eecs.berkeley.edu) Department of EECS, University of California, Berkeley
Abstract Field Programmable Gate Arrays (FPGAs) have traditionally been multiple times slower in performance and higher in total power consumption than ASIC implementations, but continue to be used due to their flexibility and non-recurring engineering costs. Certain techniques such as power gating and interconnect sizing optimization have been shown to mitigate some of these drawbacks. Introducing nano-electromechanical (NEM) relays into standard CMOS processes for can lead to advances in power reduction and performance enhancement since these devices have significantly lower parasitic elements. This body of work explores the integration of NEM relays with CMOS FPGAs, and proposes a new device to merge seamlessly into the FPGA fabric. Using NEM relays as switches in FPGAs is found to improve interconnect performance by 5X, and cuts performance penalty for power gating by an order of magnitude. overview and shows the analysis of using the NEM relays in FPGAs, both for leakage reduction and for interconnect switching. Finally we draw conclusions on the reported work.

II.NANO-ELECTROMECHANICAL (NEM) RELAYS

I. INTRODUCTION

actuated nano-electromechanical (NEMS) relay devices have been demonstrated by [1-4] to achieve very low resistance connections in the on-state and no current leakage in the off-state. Such devices are the holy grail of digital electronics if they can be actuated at very high speeds; however, mechanical devices have high delay [3-4] as shown in Fig. 1. Although implementing high-speed logic using only these devices is not feasible, the NEM relays can be used as very efficient, lowresistance switches. Power gating is a common technique used to compensate for high leakage currents in deep sub-micron CMOS [5-8,14]. A NEM relay may be used as a power gating device in order to improve the overall power consumption of a digital block when it is not in active operation and reduce the performance penalty by an order of magnitude better than CMOS. In addition, NEM relays may be used as interconnect switches, which reduce the parasitic delay and energy over their CMOS counterpart. An FPGA consists of an array of logic blocks that can be programmably interconnected to realize different designs. Up to 90% of the area in an FPGA is, however, dominated by programmable switches and routing resources [12], which make an FPGA more flexible, but significantly less efficient than an ASIC implementation, in terms of logic density, performance and power consumption. Since NEM relays have smaller parasitics than CMOS devices, introducing NEM relays into FPGAs as programmable switches could significantly alter some of these fundamental tradeoffs. The rest of this report is organized as follows. The NEM relay is introduced in Section II. Both the state of the art and a new proposed device are discussed. Power gating using the NEM relay is dicussed in Section III and trade-off curves are shown comparing the relay to CMOS. Section IV gives an FPGA
LECTROSTATICALLY

Fig. 1 [15]: Electrostatically actuated mechanical NEM relay Recently there has been a great deal of development in the area of mechanical NEM relays for low-power digital applications [1-4]. Researchers at the University of California at Berkeley have developed one such electrostatically actuated NEM relay similar to the one shown in Fig. 1 (a). Analogous to CMOS, the device actuation happens between the gate and source of the device, which when driven beyond the pull-in voltage, the channel is pulled toward the gate allowing the contact dimple to make contact with the drain. In the off state when the contact dimple and the drain are separate, the device has infinite resistance. In the on state, the contact resistance ranges from 10ohms for Au and 1kohm for W [16]. The pull-in voltage is designed to be:
3 8 E 3dff H e 8g3 k V= = p i 2 wl 7 0 AA 2 L 7o 4

where g is the gap thickness, wA is the width of the actuation pad and la is the length of the actuation pad. The pull-in voltage scales with minimum feature size as shown in Fig. 2 and is proportional to the inverse of the square of its length. Such a dependence makes it very difficult lower the length of the beam and to scale the device altogether, therefore the NEM relay will be used as a unit block and simply placed in parallel to reduce on-resistance.

2 A veriloga device model was provided by the researchers at the Berkeley Sensor and Actuator Center (BSAC) for a 90nm equivalent device for the structure shown in Fig. 1 (a). The beam is fabricated from polysilicon and the channel is fabricated from a metal such as gold (Au) or tungsten (W). The materials were chosen for ease of fabrication in the Berkeley mircolab rather than optimized for any type of integration. The pull-in voltage and mechanical delay were designed according to the curves shown in Fig. 2. The device is sized as long as possible to optimize for speed, but is limited by stiction at W=90nm and L=2.3um. This particular device is twice the footprint area of its CMOS counterpart. however, when 2.4V is placed on the gate it always pulls in since any voltage (0-1.2V) will impose an attractive force on the beam causing it to pull in. Charge pumps and high-voltage multiplexing are required to route the high voltage to actuate the sleep transistor. Such area overhead is also required of CMOS, however, the switch itself can sit in the metal layers above the active devices. If the device is passing a fixed voltage, such as in the case of power gating, it can be actuated with the standard 01.2V supply. When the back-gate is at 1.2V and the relay is fixed at 0V the beam will pull in, whereas with a back-gate voltage of 0V the switch will not pull in.

III. POWER GATING

Fig. 3: CMOS power gating using NMOS sleep transistors (right) and using the NEM relay (left) Using sleep-transistors to reduce leakage power dissipation was first proposed by [7] in multi-threshold CMOS processes. The power-gating concept is shown in Fig. 3, where PMOS and NMOS transistors are used to disconnect the leaky logic from the power supplies when not in use. This technique is very effective in reducing leakage power, particularly in low duty cycle applications. When the sleep transistor is in its off state, the leakage current (dominated by subthreshold leakage) is determined by the stack effect [15]. In its on state, the sleep transistor adds resistance to ground and Vdd. Using sleep transistors to gate logic blocks presents a fundamental tradeoff between on-resistance, leakage power and area. The NMOS gate was chosen since it is the most area efficient of the three power gating techniques illustrated in Fig. 3. Fig. 4 (a) shows the leakage power for a 90nm NMOS sleep transistor normalized to the leakage power of non-gated logic. The leakage is reduced to approximately 3%. The curve is plotted against the percentage delay increase for a FO4 inverter, again, normalized to the delay of non-gated logic. The red marks at the bottom of the graph show the delay penalties given for the typical and worstcase Cu relays. In the case of the relays, the leakage power is reduced to nearly 0, and the delay penalty is as low as 0.5%. Fig. 4 (b) compares the trade-off for performance penalty vs. normalized area of 90nm CMOS (blue) and the Cu relay (red) in the typical (lower) and worst case (upper). The area is normalized to the area of a Cu relay. It is assumed that for large logic blocks, the area is dominated by device dimensions rather than routing. In the typical case, the relay performs an order of magnitude better in FO4 delay for the same area as CMOS. The simulations for Fig. 4 were performed with BSIM 90nm predictive models and the modified NEM relay veriloga model. The advantages of power gating in FPGAs will be discussed in a later section.

Fig. 2 [4]: The upper curve shows NEM relay pull-in voltage for several standard technology nodes. The lower curve shows the mechanical delay vs. supply voltage We propose a new NEM relay device for CMOS integration as shown in Fig. 1 (b). When integrated with CMOS, the NEM relays can be used as non-timing critical switches, such as those that are programmed rarely or go into standby at infrequent rates. Since the mechanical delay is not critical in such an implementation, the device can be shortened. In addition, a copper (Cu) structure would allow the NEM relay to be fabricated in the metal layers above the CMOS devices, such that they incur less area overhead than their CMOS counterpart, and would therefore add minimal cost penalty. The new device is fabricated from a single copper beam with one contact, which reduces contact resistance and reduces the pitch to match that of Met3 and Met4 wires in 90nm CMOS. The layout rules of the device are such that the full width is 4-lambda of the process. We modified the geometry of the NEM relay in the verioga model as well as the model parameters for copper beam actuation. The original model did not include parameters for contact resistance, which are critical to the operation of the device since the on-resistance determines the device performance. The contact resistance for Cu can vary from 10ohms in the typical case to 200ohms in the worst case [16]. An air gap encapsulation is fabricated in the oxide to allow the beam to move, however, an encapsulating layer such as silicon nitride is required. The linear beam is collinear with the routing wire and either connects or disconnects the wire according the actuation voltage on the gate. The beam length was shortened to 1um for a scaled pull-in voltage of 0.9V. A high-voltage supply is required to actuate the device. When 0.6V is placed at the gate, the device never pulls in,

A. Architecture
Xilinx FPGAs [11] use island-based architectures with an array of Configurable Logic Blocks (CLBs) in a sea of programmable interconnects, that consists of a set of general purpose vertical and horizontal wiring channels used to connect logic islands located throughout the array, as shown in Fig. 5(a). The CLBs can access the routing channels using a connection box, while a switch box provides connections between the routing channels. The programmable interconnect is realized using NMOS pass transistors with SRAM, which activates or deactivates control signals at the gate. Fig 5(b) shows a representative structure of a 6pass transistor switch box. (a)

(b) Fig. 4: (a) Leakage power normalized to leakage without gating vs. delay for 90nm CMOS (blue) and the NEM relay (red) normalized to FO4. (b) Normalized delay vs. area for 90nm CMOS (blue) and the NEM relay for typical (red) and worst-case (purple) normalized to the area of a single relay.

Fig. 5: (a) Standard FPGA Architecture (top), and (b) an FPGA 6-transistor Switch Box

IV. FIELD PROGRAMMABLE GATE ARRAY (FPGA)


FPGAs are estimated [12] to be more than ten times less efficient in logic density, three times slower in performance, and three times higher in total power consumption than ASIC implementations. Although CMOS technology scaling has greatly improved the overall performance of FPGAs, the performance gap between them and ASICs has remained very wide, and is becoming even greater in sub-100-nm technologies. The system frequency has scaled at a slower pace, and power consumption has risen to unacceptable levels. An FPGA is hence an excellent application for analyzing the use of the NEM relays, both in terms of leakage and dynamic power savings, as well as improvement in performance. In the sub-sections that follow, we first talk briefly about the FPGA architecture, followed by leakage power dissipation of FPGAs and its mitigation using NEM relay power gating. We then motivate and propose the integration of the NEM relay into the underlying FPGA fabric, and finally describe a detailed analysis of the obtained power savings, performance enhancement and area overhead.

B. Leakage & Power Gating


In FPGAs, leakage power accounts for 20-25% of the total power dissipation in 90nm CMOS [5] and increasingly more at smaller technology nodes. Leakage power in switches consumes 65% of the total leakage [6]. Power gating techniques have already been shown to significantly improve power consumption in FPGAs. Power gating is critical in FPGAs to reduce leakage power. As reported by Xilinx, a typical FPGA utilizes 75% of the configurable logic blocks (CLBs), which means that 25% of the blocks provide only leakage power dissipation. FPGAs leak 4.2W per CLB [17]. A large FPGA such as the Virtex XCV1000 has an array of CLBs that is 64x96 and therefore dissipates 26mW of leakage power alone. Each CLB in a Xilinx FPGA contains two logic slices, which include look-up tables and latches, input/output multiplexers, and interconnect multiplexers (MUXs). CLBs have high utilization as shown in Table 1. If a CLB is in use, the logic slices contained within it are also utilized. I/O and routing MUXs have lower utilization and therefore account for 35% of total leakage power.

4 integration of a NEM relay into the FPGA fabric of CLBs and connection and switch boxes is therefore explored in this project. Apart from switches in programmable interconnects, NEM relays were also considered to replace pass transistor based LookUp Tables (LUTs) and multiplexers in CLBs that are basic building blocks of programmable logic in an FPGA. However, since these logic switches do not drive large capacitive loads, we found that the layout complexity of integrating NEM relays into LUTs does not justify the complete overhaul of the CLB structure. Fig. 6: FPGA logic resource utilization [17] (left) and area overhead for power gating (right.) CLB-level power gating has already shown to reduce leakage power in FPGAs by 45% assuming 75% utilization [17] and has minimal routing complexity. Transistors are sized for 10% performance degradation and incur an 8% area overhead for each CLB. A typical CLB is 60um on a side, therefore the area of power gating alone is 288um2. By calculation, the performance degradation for the silicon relay for a similar area overhead would increase to 2% in the typical case. The new Cu NEM relay would decrease the performance degradation down to .5% in the typical case and as high as 7% in the worst case for the same area footprint. It is important to note that the Cu device is integrated in the metal layers above the silicon and only requires an actuation signal that is 0-1.2V, which already exists for the CMOS counterpart. Therefore, in comparing area with CMOS the true area is reduced since the device itself does not take up active area. We estimate that there will be a 50% area savings in addition to the performance enhancement. Using fine-grain power gating for blocks with low utilization can further reduce the active leakage power. For example, having one sleep transistor per mux can reduce the leakage power. Because muxes account for 35% of the leakage power with only 20% on-chip utilization, the total power consumption can be reduced by 10%, however the area tradeoff would be unreasonable for routing the scan chain and stand-by signals to each MUX since there are dozens MUXs for each CLB.

Fig. 7: Integration of the proposed NEM relay into a connection box. Fig. 7 shows the structure of a single CLB along with the adjoining connection boxes on each of its four sides, and four diagonally placed switch boxes in its vicinity. Since FPGA density is dominated by its interconnect, integration of NEM relays into the connection and switch box imparts added focus to device area. We have considered an FPGA fabricated in 90nm technology as a reference for the proposed integration. Although the width of metal interconnects and their spacing in commercial 90nm FPGAs cannot be precisely known, for optimum performance [10], we assume routing in M3, minimum width of metal wires and a spacing of two times the minimum. The proposed integration should be in accordance with this wire pitch. The original NEM relay fabricated at UCB, as shown in Fig 1(a), is bulky in layout, and not facile for the integration, especially since it needs two control voltages (one each for the back-gate and the electric gate). Therefore, we have developed a new device, as explained in Section II, with a Cu beam and reduced length, that fits seamlessly into the interconnect metal layers, as shown in Fig 7 (inset). Since the electric gate has been removed, only one control voltage suffices. We assume that baseline FPGA connection and switch box structures already include gate boosting or level shifting blocks for the purpose of leakage current mitigation, which may be now be used instead to actuate the NEM relay at a voltage higher than Vdd. The width and length of this device are comparable to their optimal NMOS pass transistor counterparts [10], and hence there is no area penalty. In fact, the SRAM control bit can now be fabricated in active layers below the switch (Fig. 7).

C. Integration of NEM Relays into FPGA Fabric


Programmable routing in FPGAs, which uses NMOS pass transistor switches and driving buffers, contributes to more than 60% [9, 13] of the already high dynamic power consumption of FPGAs - a problem that has recently become a significant impediment to their adoption in many applications. FPGA architectures fundamentally trade-off flexibility for speed and dynamic power consumption [11], since enhanced flexibility requires switches, which are greater than minimum size, and in turn contribute to extra capacitive loading. The other drawback of using NMOS pass transistors is that they cause leakage current in driving buffers when passing a logic-high voltage. To mitigate this effect, gate boosting is employed [10] using charge pumps to maintain the NMOS switch terminals at the nominal logic-high voltage. The superior performance of NEM relays as switches makes them an excellent candidate to replace CMOS switches to improve propagation delay and power in FPGA connection and switch boxes. NEM relays have sizably lower parasitic capacitance, substantially lower resistance and better driving strength, which can change the fundamental tradeoff in flexibility vs. speed and dynamic power consumption in FPGA architectures. The

D. Performance Enhancement and Energy Savings


It is essential to quantify the potential gains in FPGA performance and energy by using the NEM relay. For this purpose, we performed a detailed analysis of the programmable interconnect. The parasitic contribution from the switches and metal trace constitute the total resistive and capacitive components of the interconnect. In order to compare between CMOS and NEM switches, it is important to establish a benchmark circuit element, which can be

5 realized using these switches. In order to quantify the interconnect delay; the critical path needs to be identified to perform a worstcase analysis. An FPGA, which is characterized by reconfigurable connections, can have multiple critical paths depending on the desired logic connections. Thus, for a fair comparison, we compute Elmore delays for a diversity of interconnect paths. Fig BLAH shows an example of one CLB routed to another through programmable interconnects, which is a cascade of pass switches and metal traces, appearing alternately. For the metal traces, the baseline routing architecture was assumed have length-1, length-2 and length-4 wires, where the suffix represents the length of the wire in terms of number of CLB tiles spanned. An FPGA uses wires of varying lengths [11] in order to bypass programmable switches for longer connections. The CLB tile was scaled down from a 0.18m benchmark [10] and estimated to be a square of 60mx60m. When long connections are required, pass transistors are unsuitable due to quadratic delay increases. Instead, the linear delay growth of buffered routing switches make them essential for use in large FPGAs. Unfortunately, buffers are slower for short connections and require 24 times more area than pass transistors. The advantages of both switch types can be gained by alternating between a buffer and N pass transistors. Sizing these pass transistors, buffers and determining the optimum N has been done in [10]. Based upon which their results, we adopted used NMOS switches of size-10 (10 times larger than the minimum width). The proposed NEM relay, with a beam length of 1m, is also identically sized to ensure a valid comparison. The buffer used was of size-5, which is essentially a 1x sense inverter cascaded with a 5x inverter driver.

Fig.8: Baseline setup to estimate interconnect delay and energy, annotated with switch and wire parasitics

Fig. 9: (a) Simulated variations of interconnect delay (top) and (b) interconnect energy with no of segments In view of the given sizings, the interconnect path setup is as shown in Fig 8. The path is essentially a route from one CLB to another, passing through k wire segments and an equal number of switch boxes. A buffer at onset of the chain drives these k segments. The interconnect also has one connection box switch at either ends corresponding to the terminating and source CLBs respectively. Fig 9 (a) shows the interconnect delay, when k is varied from 1 to 8, for metal wire traces of length-4. Plots of simulations for other lengths have been excluded for brevity. Using the same setup, total interconnect energy is also plotted for k varying from 1 to 8 for length-1 wires, and shown in Fig 9 (b). All the above simulations for energy and delay were carried out for the typical and worst case of the NEM relay resistances. For wire traces of all lengths, the interconnect incorporating the NEM relay has a 5X reduction in delay when passing through 8 segments of routing for the typical case (Fig. 9(a)). The gains are lower for shorter segments (2X for length-2), since the delay is dominated by the buffer driver. While MOS interconnects experience a sharp quadratic increase in delay with an increase in number of routing segments, the delay in NEM interconnects increases gradually. NEM interconnects are therefore amenable to longer routing and greatly reduce the need for buffer drivers and provide crucial area savings. More importantly, for the same

6 footprint area, NEM relays provide significant improvement in interconnect delay, which could ease a severe bottleneck for commercial state-of-the-art FPGAs. The savings in dynamic power using NEM interconnects vis-vis MOS interconnects was found to be 25% (Fig. 9 (b)). The gains are not as significant as the performance enhancement, primarily because the interconnect capacitance is mostly dominated by the long metal wire traces.
[8] J. Kao, S. Narendra, A. Chandrakasan, Subthreshold Leakage Modeling and Reduction Techniques. Computer Aided Design, 2002. ICCAD Nov. 2002 pp. 141-148. [9] M. Lin, A.E. Gamal, Y.C. Lu, S. Wong, Performance Benefits of Monolithically Stacked 3-D FPGA. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems. Vol. 26, no. 2, Feb. 2007. [10] G. Lemieux, D. Lewis, Circuit Design of Routing Switches in Proc. ACM/SIGDA 10th Int. Symp. Field-Programmable Gate Arrays, Feb. 2002, pp. 19-28. [11] J. Rose, A.E. Gamal, A. Sangiovanni-Vincentelli, Architecture of Field-Programmable Gate Arrays. Proc. of the IEEE. Vol. 18, No. 7, July 1993. [12] V. George, H. Zhang, J. Rabaey, The Design of a Low Energy FPGA ISLPED 1999, pp. 188-193. [13] L. Shang, A. S. Kaviani, and K. Bathala, Dynamic power consumption in Virtex-II FPGA Family, in Proc. ACM/SIGDA 10th Int. Symp. Field Programmable Gate Arrays, 2002, pp 157 164 [14] B. Calhoun, F. Honore, A. Chandrakasan, A Leakage Reduction Methodology for Distributed MTCMOS IEEE JSSC, Vol. 39, No. 5, May 2004 [15] E. Alon, F. Chen, H. Kam, TJ King, D. Markovic, V. Stojanovic, Integrated Circuit Design with NEM Relays. ICCAD 2008. [16] K. Akarvadar et al, Design Considerations for Complimentary Nanoelectromenchanical Logic Gates, IEEE International Electron Devices Meeting, Dec. 2007. [17] T. Tuan, B. Lai, Leakage Power Analysis of a 90nm FPGA CICC 2003

V.CONCLUSION
We have proposed the introduction of an electrostatically actuated mechanical nano-relay device in standard CMOS processes, in particular for use in FPGAs. We have shown the use of these devices for two main purposes: to reduce leakage power through power gating, and to replace MOS switches in FPGAs. Although the intention in using this device was to reduce leakage power, the true benefit lies in the performance enhancement. The proposed NEM relay measures 1um by 0.18um and for the same footprint lowers the performance penalty in power gating by an order of magnitude. In addition, using the nano-relay in FPGA switch boxes reduces interconnect delay by a factor of 5. The area is the same when compared with gate-boosted switches. If seamlessly integrated into standard IC processes the NEM relay proves to be an excellent candidate for performance enhancement in circuits with infrequent reconfiguration.

VI. ACKNOWLEDGEMENTS
The authors would like to thank Borivoje Nikolic, Elad Alon and Hei Kam for their time and invaluable discussions.

REFERENCES
[1] H. Kam, D.T. Lee, R.T. Howe and T. J. King-Liu, "A new nano-electro-mechanical field effect transistor (NEMFET) design for low-power electronics," in IEDM Tech Dig. 2005, pp.463- 466. [2] A.M. Ionescu, V. Pott, R. Fritschi, K. Banerjee, M.J. Declercq, P. Renaud, C. Hibert, P.Fluckiger and G.A. Racine, Modeling and design of a low-voltage SOI suspended-gate MOSFET (SGMOSFET) with a metal-over-gate architecture in ISQED Proc. 2002, pp. 496-501. [3] H. Kam, T.J. King-Liu, A General Performance Analysis and Scaling Theory of Electro-Mechanical Switches (UCB internal, unpublished work) [4] H. Kam, J. Lai, E. Alon, T.J. King-Liu, Mosfet-Inspired MicroRelay Design and Scaling For Ultra-Low-Power-Digital ICs. (UCB internal, unpublished work) [5] A. Rahman, S. Das, T. Tuan, S. Trimberger, Determination of Power Gating Granularity for FPGA Fabric CICC 2006 pp.9-12. [6] S. Srinivasan, A. Gayasen, N. Vijaykrishnan, T. Tuan, Leakage Control in FPGA Routing Fabric Design Automation Conferece, 2005. Proc. of the ASP-DAC 2005. Asia and South Pacific Volume 1, 18-21 Jan. 2005 pp. 331-664 [7] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S Shigematsu, J. Yamada, 1-V Power Supply High-Speed Digital Circuit Technology With Multithreshold-Voltage CMOS IEEE Journal of Sold-State Circuits Vol. 30, Issue 8, Aug. 1995 pp. 847-854.

You might also like