Professional Documents
Culture Documents
I. INTRODUCTION
actuated nano-electromechanical (NEMS) relay devices have been demonstrated by [1-4] to achieve very low resistance connections in the on-state and no current leakage in the off-state. Such devices are the holy grail of digital electronics if they can be actuated at very high speeds; however, mechanical devices have high delay [3-4] as shown in Fig. 1. Although implementing high-speed logic using only these devices is not feasible, the NEM relays can be used as very efficient, lowresistance switches. Power gating is a common technique used to compensate for high leakage currents in deep sub-micron CMOS [5-8,14]. A NEM relay may be used as a power gating device in order to improve the overall power consumption of a digital block when it is not in active operation and reduce the performance penalty by an order of magnitude better than CMOS. In addition, NEM relays may be used as interconnect switches, which reduce the parasitic delay and energy over their CMOS counterpart. An FPGA consists of an array of logic blocks that can be programmably interconnected to realize different designs. Up to 90% of the area in an FPGA is, however, dominated by programmable switches and routing resources [12], which make an FPGA more flexible, but significantly less efficient than an ASIC implementation, in terms of logic density, performance and power consumption. Since NEM relays have smaller parasitics than CMOS devices, introducing NEM relays into FPGAs as programmable switches could significantly alter some of these fundamental tradeoffs. The rest of this report is organized as follows. The NEM relay is introduced in Section II. Both the state of the art and a new proposed device are discussed. Power gating using the NEM relay is dicussed in Section III and trade-off curves are shown comparing the relay to CMOS. Section IV gives an FPGA
LECTROSTATICALLY
Fig. 1 [15]: Electrostatically actuated mechanical NEM relay Recently there has been a great deal of development in the area of mechanical NEM relays for low-power digital applications [1-4]. Researchers at the University of California at Berkeley have developed one such electrostatically actuated NEM relay similar to the one shown in Fig. 1 (a). Analogous to CMOS, the device actuation happens between the gate and source of the device, which when driven beyond the pull-in voltage, the channel is pulled toward the gate allowing the contact dimple to make contact with the drain. In the off state when the contact dimple and the drain are separate, the device has infinite resistance. In the on state, the contact resistance ranges from 10ohms for Au and 1kohm for W [16]. The pull-in voltage is designed to be:
3 8 E 3dff H e 8g3 k V= = p i 2 wl 7 0 AA 2 L 7o 4
where g is the gap thickness, wA is the width of the actuation pad and la is the length of the actuation pad. The pull-in voltage scales with minimum feature size as shown in Fig. 2 and is proportional to the inverse of the square of its length. Such a dependence makes it very difficult lower the length of the beam and to scale the device altogether, therefore the NEM relay will be used as a unit block and simply placed in parallel to reduce on-resistance.
2 A veriloga device model was provided by the researchers at the Berkeley Sensor and Actuator Center (BSAC) for a 90nm equivalent device for the structure shown in Fig. 1 (a). The beam is fabricated from polysilicon and the channel is fabricated from a metal such as gold (Au) or tungsten (W). The materials were chosen for ease of fabrication in the Berkeley mircolab rather than optimized for any type of integration. The pull-in voltage and mechanical delay were designed according to the curves shown in Fig. 2. The device is sized as long as possible to optimize for speed, but is limited by stiction at W=90nm and L=2.3um. This particular device is twice the footprint area of its CMOS counterpart. however, when 2.4V is placed on the gate it always pulls in since any voltage (0-1.2V) will impose an attractive force on the beam causing it to pull in. Charge pumps and high-voltage multiplexing are required to route the high voltage to actuate the sleep transistor. Such area overhead is also required of CMOS, however, the switch itself can sit in the metal layers above the active devices. If the device is passing a fixed voltage, such as in the case of power gating, it can be actuated with the standard 01.2V supply. When the back-gate is at 1.2V and the relay is fixed at 0V the beam will pull in, whereas with a back-gate voltage of 0V the switch will not pull in.
Fig. 3: CMOS power gating using NMOS sleep transistors (right) and using the NEM relay (left) Using sleep-transistors to reduce leakage power dissipation was first proposed by [7] in multi-threshold CMOS processes. The power-gating concept is shown in Fig. 3, where PMOS and NMOS transistors are used to disconnect the leaky logic from the power supplies when not in use. This technique is very effective in reducing leakage power, particularly in low duty cycle applications. When the sleep transistor is in its off state, the leakage current (dominated by subthreshold leakage) is determined by the stack effect [15]. In its on state, the sleep transistor adds resistance to ground and Vdd. Using sleep transistors to gate logic blocks presents a fundamental tradeoff between on-resistance, leakage power and area. The NMOS gate was chosen since it is the most area efficient of the three power gating techniques illustrated in Fig. 3. Fig. 4 (a) shows the leakage power for a 90nm NMOS sleep transistor normalized to the leakage power of non-gated logic. The leakage is reduced to approximately 3%. The curve is plotted against the percentage delay increase for a FO4 inverter, again, normalized to the delay of non-gated logic. The red marks at the bottom of the graph show the delay penalties given for the typical and worstcase Cu relays. In the case of the relays, the leakage power is reduced to nearly 0, and the delay penalty is as low as 0.5%. Fig. 4 (b) compares the trade-off for performance penalty vs. normalized area of 90nm CMOS (blue) and the Cu relay (red) in the typical (lower) and worst case (upper). The area is normalized to the area of a Cu relay. It is assumed that for large logic blocks, the area is dominated by device dimensions rather than routing. In the typical case, the relay performs an order of magnitude better in FO4 delay for the same area as CMOS. The simulations for Fig. 4 were performed with BSIM 90nm predictive models and the modified NEM relay veriloga model. The advantages of power gating in FPGAs will be discussed in a later section.
Fig. 2 [4]: The upper curve shows NEM relay pull-in voltage for several standard technology nodes. The lower curve shows the mechanical delay vs. supply voltage We propose a new NEM relay device for CMOS integration as shown in Fig. 1 (b). When integrated with CMOS, the NEM relays can be used as non-timing critical switches, such as those that are programmed rarely or go into standby at infrequent rates. Since the mechanical delay is not critical in such an implementation, the device can be shortened. In addition, a copper (Cu) structure would allow the NEM relay to be fabricated in the metal layers above the CMOS devices, such that they incur less area overhead than their CMOS counterpart, and would therefore add minimal cost penalty. The new device is fabricated from a single copper beam with one contact, which reduces contact resistance and reduces the pitch to match that of Met3 and Met4 wires in 90nm CMOS. The layout rules of the device are such that the full width is 4-lambda of the process. We modified the geometry of the NEM relay in the verioga model as well as the model parameters for copper beam actuation. The original model did not include parameters for contact resistance, which are critical to the operation of the device since the on-resistance determines the device performance. The contact resistance for Cu can vary from 10ohms in the typical case to 200ohms in the worst case [16]. An air gap encapsulation is fabricated in the oxide to allow the beam to move, however, an encapsulating layer such as silicon nitride is required. The linear beam is collinear with the routing wire and either connects or disconnects the wire according the actuation voltage on the gate. The beam length was shortened to 1um for a scaled pull-in voltage of 0.9V. A high-voltage supply is required to actuate the device. When 0.6V is placed at the gate, the device never pulls in,
A. Architecture
Xilinx FPGAs [11] use island-based architectures with an array of Configurable Logic Blocks (CLBs) in a sea of programmable interconnects, that consists of a set of general purpose vertical and horizontal wiring channels used to connect logic islands located throughout the array, as shown in Fig. 5(a). The CLBs can access the routing channels using a connection box, while a switch box provides connections between the routing channels. The programmable interconnect is realized using NMOS pass transistors with SRAM, which activates or deactivates control signals at the gate. Fig 5(b) shows a representative structure of a 6pass transistor switch box. (a)
(b) Fig. 4: (a) Leakage power normalized to leakage without gating vs. delay for 90nm CMOS (blue) and the NEM relay (red) normalized to FO4. (b) Normalized delay vs. area for 90nm CMOS (blue) and the NEM relay for typical (red) and worst-case (purple) normalized to the area of a single relay.
Fig. 5: (a) Standard FPGA Architecture (top), and (b) an FPGA 6-transistor Switch Box
4 integration of a NEM relay into the FPGA fabric of CLBs and connection and switch boxes is therefore explored in this project. Apart from switches in programmable interconnects, NEM relays were also considered to replace pass transistor based LookUp Tables (LUTs) and multiplexers in CLBs that are basic building blocks of programmable logic in an FPGA. However, since these logic switches do not drive large capacitive loads, we found that the layout complexity of integrating NEM relays into LUTs does not justify the complete overhaul of the CLB structure. Fig. 6: FPGA logic resource utilization [17] (left) and area overhead for power gating (right.) CLB-level power gating has already shown to reduce leakage power in FPGAs by 45% assuming 75% utilization [17] and has minimal routing complexity. Transistors are sized for 10% performance degradation and incur an 8% area overhead for each CLB. A typical CLB is 60um on a side, therefore the area of power gating alone is 288um2. By calculation, the performance degradation for the silicon relay for a similar area overhead would increase to 2% in the typical case. The new Cu NEM relay would decrease the performance degradation down to .5% in the typical case and as high as 7% in the worst case for the same area footprint. It is important to note that the Cu device is integrated in the metal layers above the silicon and only requires an actuation signal that is 0-1.2V, which already exists for the CMOS counterpart. Therefore, in comparing area with CMOS the true area is reduced since the device itself does not take up active area. We estimate that there will be a 50% area savings in addition to the performance enhancement. Using fine-grain power gating for blocks with low utilization can further reduce the active leakage power. For example, having one sleep transistor per mux can reduce the leakage power. Because muxes account for 35% of the leakage power with only 20% on-chip utilization, the total power consumption can be reduced by 10%, however the area tradeoff would be unreasonable for routing the scan chain and stand-by signals to each MUX since there are dozens MUXs for each CLB.
Fig. 7: Integration of the proposed NEM relay into a connection box. Fig. 7 shows the structure of a single CLB along with the adjoining connection boxes on each of its four sides, and four diagonally placed switch boxes in its vicinity. Since FPGA density is dominated by its interconnect, integration of NEM relays into the connection and switch box imparts added focus to device area. We have considered an FPGA fabricated in 90nm technology as a reference for the proposed integration. Although the width of metal interconnects and their spacing in commercial 90nm FPGAs cannot be precisely known, for optimum performance [10], we assume routing in M3, minimum width of metal wires and a spacing of two times the minimum. The proposed integration should be in accordance with this wire pitch. The original NEM relay fabricated at UCB, as shown in Fig 1(a), is bulky in layout, and not facile for the integration, especially since it needs two control voltages (one each for the back-gate and the electric gate). Therefore, we have developed a new device, as explained in Section II, with a Cu beam and reduced length, that fits seamlessly into the interconnect metal layers, as shown in Fig 7 (inset). Since the electric gate has been removed, only one control voltage suffices. We assume that baseline FPGA connection and switch box structures already include gate boosting or level shifting blocks for the purpose of leakage current mitigation, which may be now be used instead to actuate the NEM relay at a voltage higher than Vdd. The width and length of this device are comparable to their optimal NMOS pass transistor counterparts [10], and hence there is no area penalty. In fact, the SRAM control bit can now be fabricated in active layers below the switch (Fig. 7).
5 realized using these switches. In order to quantify the interconnect delay; the critical path needs to be identified to perform a worstcase analysis. An FPGA, which is characterized by reconfigurable connections, can have multiple critical paths depending on the desired logic connections. Thus, for a fair comparison, we compute Elmore delays for a diversity of interconnect paths. Fig BLAH shows an example of one CLB routed to another through programmable interconnects, which is a cascade of pass switches and metal traces, appearing alternately. For the metal traces, the baseline routing architecture was assumed have length-1, length-2 and length-4 wires, where the suffix represents the length of the wire in terms of number of CLB tiles spanned. An FPGA uses wires of varying lengths [11] in order to bypass programmable switches for longer connections. The CLB tile was scaled down from a 0.18m benchmark [10] and estimated to be a square of 60mx60m. When long connections are required, pass transistors are unsuitable due to quadratic delay increases. Instead, the linear delay growth of buffered routing switches make them essential for use in large FPGAs. Unfortunately, buffers are slower for short connections and require 24 times more area than pass transistors. The advantages of both switch types can be gained by alternating between a buffer and N pass transistors. Sizing these pass transistors, buffers and determining the optimum N has been done in [10]. Based upon which their results, we adopted used NMOS switches of size-10 (10 times larger than the minimum width). The proposed NEM relay, with a beam length of 1m, is also identically sized to ensure a valid comparison. The buffer used was of size-5, which is essentially a 1x sense inverter cascaded with a 5x inverter driver.
Fig.8: Baseline setup to estimate interconnect delay and energy, annotated with switch and wire parasitics
Fig. 9: (a) Simulated variations of interconnect delay (top) and (b) interconnect energy with no of segments In view of the given sizings, the interconnect path setup is as shown in Fig 8. The path is essentially a route from one CLB to another, passing through k wire segments and an equal number of switch boxes. A buffer at onset of the chain drives these k segments. The interconnect also has one connection box switch at either ends corresponding to the terminating and source CLBs respectively. Fig 9 (a) shows the interconnect delay, when k is varied from 1 to 8, for metal wire traces of length-4. Plots of simulations for other lengths have been excluded for brevity. Using the same setup, total interconnect energy is also plotted for k varying from 1 to 8 for length-1 wires, and shown in Fig 9 (b). All the above simulations for energy and delay were carried out for the typical and worst case of the NEM relay resistances. For wire traces of all lengths, the interconnect incorporating the NEM relay has a 5X reduction in delay when passing through 8 segments of routing for the typical case (Fig. 9(a)). The gains are lower for shorter segments (2X for length-2), since the delay is dominated by the buffer driver. While MOS interconnects experience a sharp quadratic increase in delay with an increase in number of routing segments, the delay in NEM interconnects increases gradually. NEM interconnects are therefore amenable to longer routing and greatly reduce the need for buffer drivers and provide crucial area savings. More importantly, for the same
6 footprint area, NEM relays provide significant improvement in interconnect delay, which could ease a severe bottleneck for commercial state-of-the-art FPGAs. The savings in dynamic power using NEM interconnects vis-vis MOS interconnects was found to be 25% (Fig. 9 (b)). The gains are not as significant as the performance enhancement, primarily because the interconnect capacitance is mostly dominated by the long metal wire traces.
[8] J. Kao, S. Narendra, A. Chandrakasan, Subthreshold Leakage Modeling and Reduction Techniques. Computer Aided Design, 2002. ICCAD Nov. 2002 pp. 141-148. [9] M. Lin, A.E. Gamal, Y.C. Lu, S. Wong, Performance Benefits of Monolithically Stacked 3-D FPGA. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems. Vol. 26, no. 2, Feb. 2007. [10] G. Lemieux, D. Lewis, Circuit Design of Routing Switches in Proc. ACM/SIGDA 10th Int. Symp. Field-Programmable Gate Arrays, Feb. 2002, pp. 19-28. [11] J. Rose, A.E. Gamal, A. Sangiovanni-Vincentelli, Architecture of Field-Programmable Gate Arrays. Proc. of the IEEE. Vol. 18, No. 7, July 1993. [12] V. George, H. Zhang, J. Rabaey, The Design of a Low Energy FPGA ISLPED 1999, pp. 188-193. [13] L. Shang, A. S. Kaviani, and K. Bathala, Dynamic power consumption in Virtex-II FPGA Family, in Proc. ACM/SIGDA 10th Int. Symp. Field Programmable Gate Arrays, 2002, pp 157 164 [14] B. Calhoun, F. Honore, A. Chandrakasan, A Leakage Reduction Methodology for Distributed MTCMOS IEEE JSSC, Vol. 39, No. 5, May 2004 [15] E. Alon, F. Chen, H. Kam, TJ King, D. Markovic, V. Stojanovic, Integrated Circuit Design with NEM Relays. ICCAD 2008. [16] K. Akarvadar et al, Design Considerations for Complimentary Nanoelectromenchanical Logic Gates, IEEE International Electron Devices Meeting, Dec. 2007. [17] T. Tuan, B. Lai, Leakage Power Analysis of a 90nm FPGA CICC 2003
V.CONCLUSION
We have proposed the introduction of an electrostatically actuated mechanical nano-relay device in standard CMOS processes, in particular for use in FPGAs. We have shown the use of these devices for two main purposes: to reduce leakage power through power gating, and to replace MOS switches in FPGAs. Although the intention in using this device was to reduce leakage power, the true benefit lies in the performance enhancement. The proposed NEM relay measures 1um by 0.18um and for the same footprint lowers the performance penalty in power gating by an order of magnitude. In addition, using the nano-relay in FPGA switch boxes reduces interconnect delay by a factor of 5. The area is the same when compared with gate-boosted switches. If seamlessly integrated into standard IC processes the NEM relay proves to be an excellent candidate for performance enhancement in circuits with infrequent reconfiguration.
VI. ACKNOWLEDGEMENTS
The authors would like to thank Borivoje Nikolic, Elad Alon and Hei Kam for their time and invaluable discussions.
REFERENCES
[1] H. Kam, D.T. Lee, R.T. Howe and T. J. King-Liu, "A new nano-electro-mechanical field effect transistor (NEMFET) design for low-power electronics," in IEDM Tech Dig. 2005, pp.463- 466. [2] A.M. Ionescu, V. Pott, R. Fritschi, K. Banerjee, M.J. Declercq, P. Renaud, C. Hibert, P.Fluckiger and G.A. Racine, Modeling and design of a low-voltage SOI suspended-gate MOSFET (SGMOSFET) with a metal-over-gate architecture in ISQED Proc. 2002, pp. 496-501. [3] H. Kam, T.J. King-Liu, A General Performance Analysis and Scaling Theory of Electro-Mechanical Switches (UCB internal, unpublished work) [4] H. Kam, J. Lai, E. Alon, T.J. King-Liu, Mosfet-Inspired MicroRelay Design and Scaling For Ultra-Low-Power-Digital ICs. (UCB internal, unpublished work) [5] A. Rahman, S. Das, T. Tuan, S. Trimberger, Determination of Power Gating Granularity for FPGA Fabric CICC 2006 pp.9-12. [6] S. Srinivasan, A. Gayasen, N. Vijaykrishnan, T. Tuan, Leakage Control in FPGA Routing Fabric Design Automation Conferece, 2005. Proc. of the ASP-DAC 2005. Asia and South Pacific Volume 1, 18-21 Jan. 2005 pp. 331-664 [7] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S Shigematsu, J. Yamada, 1-V Power Supply High-Speed Digital Circuit Technology With Multithreshold-Voltage CMOS IEEE Journal of Sold-State Circuits Vol. 30, Issue 8, Aug. 1995 pp. 847-854.