You are on page 1of 6

RITS ICAEM-2012: RITS-International Conference On Advancements In Engineering & Management

28 & 29 February 2012

Minimization of Power Dissipation in Digital Circuits at Various Levels of Design Hierarchy


Khaja Mujeebuddin Quadry Professor, Dept. of ECE, Royal Institute of Technology & Science, Chevella, R. R. Dist. A. P. India Email: mujeebqd@yahoo.com Dr. Syed Abdul Sattar Professor & Dean of Academics, Royal Institute of Technology & Science, Chevella, R. R. Dist. A. P. India. Email:syedabdulsattar1965@gmail.com Abstract As the demand for portable battery operated equipment that is sufficiently small in size and weight and long in operating life is increasing day by day,power dissipation has become one of the most critical factor in the continues development of microelectroics technology. Low Power dissipation increases the reliebility of the circuit and reduces the cost of the product as cooling overhead is reduced. Hence research activities focusing on low power/low voltage design techniques are underway. A review of low-power techniques applied at many levels of the design hierarchy is presented in this paper. I. INTRODUCTION The most recent need is for ever-increasing packing density in order to further enhance the speed of high performance systems, which imposes severe restrictions on power dissipation density and broadest need is for conservation of power in desk-top and desk-side systems where cost-to-performance ratio for a competitive product demands low power operation to reduce power supply and cooling costs. Since 'power' is now one of the design decision variables, the expanded design space required for low power has further increased the complexity of an already non-trivial task. Low power design basically involves two tasks: power estimation and analysis and power minimization. These tasks need to be carried out at each of the levels in the design hierarchy, namely, the behavioral, architectural, logic, circuit and physical levels. In this paper we have done the survey of the current state of the field, many of the salient power estimation and minimization techniques proposed for low power VLSI design are reviewed. For each of the design levels, we provide an overview of several power estimation and minimization approaches and the CAD tools that support them.

Pavg=Pswitching+Pshorcircuit+Pleakage ---(2.1) A.Dynamic dissipation The first two terms in equation (2.1) represent the dynamic source of power dissipation. The switching component, Pswitching, arises when the capacitive load, CL, of a CMOS circuit is charged through PMOS transistors to make a voltage transition from 0 to the high voltage level, which is usually the supply, Vdd . For an inverter circuit as shown in figure 2.1,

Fig.1 cmos inverter the power dissipated because of a 0 to 1 transition can be determined from the product Vdd IC where IC is the transient current drawn from the supply. The time duration for this current flow is T. It can be written in 2.2.

II.SOURCES OF POWER DISSIPATION In CMOS circuits, there are two major sources of power dissipation [9]. Static dissipation is due to leakage current or other current drawn continuously from the power supply Dynamic dissipation, due to switching transient (short-circuit) current, charging and discharging of load capacitances Total power dissipation can be obtained from the sum of these components as summarized in equation (2.1). ---- (2.2) The energy drawn from the power supply is given in 2.3.

RITS ICAEM-2012: RITS-International Conference On Advancements In Engineering & Management

28 & 29 February 2012

III.TECHNIQUES FOR LOW POWER The previous section revealed the parameters that the designer needs to change for low power design as shown in equation (2.4): voltage, physical capacitance, and activity. Unfortunately the difficulty for power optimization arises from the fact that these parameters are not completely orthogonal. Therefore they can not be optimized independently. A. Supply voltage With its quadratic relationship to power, voltage reduction offers the most direct means of minimizing power consumption. Without requiring any special circuits or technologies, a factor of two reduction in supply voltage yields a factor of four decrease in energy. Because of this quadratic relationship, designers are willing to sacrifice increased physical capacitance and activity for reduced voltage. Unfortunately supply voltage can not be decreased without bound. In fact several other factors influence the selection of a system supply voltage. The primary determining factors are performance requirements and compatibility issues. Reducing the supply voltage degrades the speed of a CMOS circuit. B. Physical capacitance Dynamic power consumption depends linearly on the physical capacitance being switched. In addition to operating at low voltages, minimizing capacitance offers another technique for minimizing power consumption. The physical capacitance in CMOS circuits stems from two primary sources: devices and interconnect. As technologies continue to scale down, interconnect parasitics will start to dominate over device capacitances. Capacitances can be kept at a minimum by using less logic, smaller devices, and fewer and shorter wires. Some techniques reducing the active area include resource sharing, logic minimization and gate sizing. Techniques for reducing the interconnect include register sharing, common sub-function extraction, placement and routing. However we are not free to optimize capacitance independently. For example reducing device sizes reduces physical capacitance, but it also reduces the current drive ability of the transistors making the circuit operate more slowly. This loss in performance might prevent us from lowering Vdd as much as we might otherwise be able to do. C.Activity A chip can contain a huge amount of physical capacitance, but if it does not switch then no dynamic power will be consumed. The activity determines how often this switching occurs. As given in (2.4) there are two components to switching activity. the first is the data rate, fclk, which reflects how often on average, new data arrives at each node. This data might or might not be different from the previous data value. In this sense, the data rate fclk describes how often on average, switching

--- (2.3) Half of the energy given in (2.3) is stored in the output capacitor and half of it is dissipated in the PMOS transistor [14]. On the 1 to 0 transition at the output, no charge is drawn from the supply, however the energy stored in the output capacitor is consumed. If these transitions occur at a clock rate, fclk, the power drawn from the supply is However, in general the switching will not occur at the clock rate (except for clock buffers), but rather at some reduced rate, which is best described probabilistically. is defined as the average number of times in each clock cycle that a node with a capacitance CL will make a power consuming transition (0 to 1), resulting in an average switching component of power for a CMOS gate to be,

-----(2.4) Another dynamic component of power dissipation is Pshort circuit . At some point during the switching transient, both the NMOS and PMOS devices in fig.1 will be turned on. This occurs for gate voltages between Vtn and Vdd _-- Vt p _ where Vtn and _Vt p _ are threshold voltages of the NMOS and PMOS transistors, respectively. During this time, a short-circuit exists between Vdd and ground. Therefore currents are allowed to flow. If Vdd -Gnd <Vtn -Vt p _ is satisfied then a short circuit path between the power supply and ground will never exist, meaning that this component of (2.1) can be eliminated. But even though Pshort circuit can not always be ignored, it certainly is not the dominant component of power consumption. An analytical derivation for Pshort circuit is given in [4]. B.Static dissipation Ideally, CMOS circuits dissipate no static (DC) power since in the steady state there is no direct path from Vdd to ground. Of course, this scenario can never be realized in practice since in reality the MOS transistor is not a perfect switch. Static power dissipation, Pleakage , stems from the leakage current, Ileakage , which can arise from substrate injection and subthreshold effects and primarily determined by fabrication technology considerations. This current is typically in the nA region and contributes little to the overall power consumption. However, in future deep submicron technologies, leakage power will become a problem. The most dominant component of power dissipation currently is Pswitching given in (2.4). Next section will introduce techniques in order to reduce Pswitching.

RITS ICAEM-2012: RITS-International Conference On Advancements In Engineering & Management

28 & 29 February 2012

could occur. For example, in synchronous systems fclk might correspond to the clock frequency. IV.MINIMIZATIO OF POWER CONSUMPTION We have seen the design variables that effect the dynamic power consumption of a CMOS circuit. Now we will investigate the power minimization problem from various design aspects that effect power dissipation: technology, circuit techniques, architectures and algorithms. A.Technology An optimization that could be done at this level is driven by voltage scaling. As it is necessary to scale supply voltage for a quadratic improvement in energy per transition. Unfortunately, we pay a speed penalty for a Vdd reduction with delays increasing, as Vdd approaches the threshold voltage of the devices. The simple first order relationship between Vdd and gate delay, td for a CMOS gate is given in 2.5,

---(2.5) The objective is to reduce power consumption while keeping the throughput of the overall system fixed. Therefore compensation for these delays at low voltages is required. At the technology level, an approach to reduce the supply voltage without loss in throughput is to lower the threshold voltage of the devices. However, lower threshold means higher stand-by power consumption, therefore only transistors that comprise delay-critical paths should be modified. These multi-threshold circuits attract significant research interest [10, 7, 11]. Since a significant power improvement can be gained by the use of lowthreshold devices, another issue to address is how low the thresholds can be reduced. The limit is set by the requirement to retain adequate noise margins and the increase in sub threshold currents. B.Fabrication Level Techniques To minimize the overall static power dissipation, a straightforward way is to minimize the leakage current in each transistor. This can be done through fabrication techniques of transistors.Currently, fabrication techniques, such as high-k insulating materials, retrograde doping and halo doping, are already in use to provide transistors with the best performance and reduce the leakage at the same time. Here we present some examples for these fabrication techniques, illustrated as below: Y. Taur (2000) In [10], Y. Taur found that with deep submicron transistors, to maintain performance, scaling happens not only in the lateral dimension (channel length), but also in the vertical dimension, doping concentration and supply voltage. Thus, as gate oxide thickness got thinner, this results in increased leakage through gate node. To solve this problem, the author proposed to use high-k insulating materials, which increases physical thickness of the

insulator while keeping reduced equivalent electrical thickness and eventually minimizes the leakage current through gate node. _ S. Thompson et al. (1998) As the channel length is scaled down, punch-through current becomes a big issue. At the same time, to maintain device performance, the mobility of the channel surface should be good enough. Thus, a better channel doping profile should be with a low surface doping concentration followed with a highly doped sub-surface doping region. This is called Retrograde Doping. In the study of [11], S. Thompson et al. illustrated the retrograde doping technique and its useful effect on minimizing the punchthrough leakage current. D. Fotty (1997) In the study of [12], D. Fotty suggested using the halo doping technology to reduce the sub-threshold leakage. In these fabrication techniques, high-k gate dielectrics are expected to lower the static leakage [13]. On the other hand, retrograde and halo doping are also used as a means to decrease the static leakage current, by scaling the channel length and increasing the transistor drive current [ 14,15, 16, 17].More detailed discussion of these fabrication techniques can be found in [8]. So far, the fabrication techniques are commonly employed in transistors to provide good performance, and also minimize the overall static leakage. With the advance of technology, more and more fabrication techniques are predicted to be used to reduce the leakage-induced power dissipation in future. C. Circuit techniques There are a number of options available in choosing the basic circuit approach and topology for implementing various logic and arithmetic functions. Choices between static vs. dynamic implementations, pass-transistor vs. conventional CMOS logic styles, and synchronous vs. asynchronous timing are just some of the options open to the system designer. At the RT level, there are also various architectural choices for implementing a given logic function; for example to implement an adder module one can utilize a ripple-carry, carry-select, or carrylookahead topology. D.Dynamic vs. static logic Dynamic logic has some inherent advantages in a number of areas including (1)reduced switching activity due to hazards, (2) elimination of short-circuit dissipation, and (3) reduced parasitic node capacitances. These are explained briefly in the following. (1) Static designs can exhibit spurious transitions (also called dynamic hazards [9]) due to finite propagation delays from one logic block to the next i.e., a node can have multiple transitions in a clock cycle before settling to the correct level. The number of these extra transitions is a function of input patterns, internal state assignment in the logic design, delay skew and logic depth. Though it is possible with careful logic design to eliminate these transitions, dynamic logic does not have this problem, since any node can undergo at most one power consuming transition per clock cycle.

RITS ICAEM-2012: RITS-International Conference On Advancements In Engineering & Management

28 & 29 February 2012

E.Pass-transistor vs. static logic Complementary pass-transistor logic (CPL) family is one form of logic that is popular in NMOS-rich circuits [9, 6]. The gate design uses only NMOS transistors and requires the inverted input signals as well to implement Karnaugh maps for logic functions. As logic signals are only passed through NMOS transistors, the high output signal may deteriorate because of threshold voltage drops. This will require the output signals to be regenerated by inverters/buffers. Pass-transistor logic is attractive as fewer transistors are required to implement important logic functions, such as XORs which only require two pass transistors in a CPL implementation. This particularly efficient implementation of an XOR is important since it is key to most arithmetic functions, permitting adders and multipliers to be created using a minimal number of devices. Likewise, multiplexers, registers, and other key building blocks are simplified using pass-gate designs. However, a CPL implementation (explained in detail in [6]) has two basic problems: (1) the threshold drop across the pass transistors results in reduced current drive and hence slower operation at reduced supply voltages and (2) The "high" input voltage level at the regenerative inverters is not Vdd , therefore the PMOS device in the inverter is not fully turned off. This may cause significant static power dissipation. F.Synchronous vs. Asynchronous In synchronous designs, the logic between registers is continuously computing every clock cycle based on its new inputs. To reduce the power consumption in synchronous designs, it is important to minimize switching activity by powering down execution units when they are not performing useful operations. While the design of synchronous circuits requires special design effort and power-down circuitry to detect and shut down unused units (clock gating), asynchronous logic has inherent power-down of unused modules, since transitions occur only when necessary. However, asynchronous implementations require the generation of a completion signal indicating the validity of the output signals. This control logic represents an overhead in terms of silicon area, speed and power consumption. Therefore, one has to ask whether or not, the use of asynchronous techniques result in a substantial improvement over the synchronous counterpart [5]. G.Circuit topology Independent of the logic style used, the topology to implement a given function can affect the capacitance switched. For instance, lets consider a ripple-carry vs. carry select adder. Ideally, it is always better to use a topology that consumes the least amount of energy per operation. Unfortunately, the choice of circuit approach is not independent of circuit speed. At large bit-widths, the CSA is faster than the ripple carry adder. This speed advantage can be used to lower the supply voltage while keeping the throughput of the system constant.

Consequently a CSA could very well be the low power choice even though it switches more capacitance. V.CIRCUIT LEVEL TECHIQUES With the fabrication level techniques applied to extremes, additional leakage power reduction can be achieved by carefully designing the circuit structures. In this section, we will present several popular circuit level techniques which are used to reduce the static leakage current. A.Transistor Stack One promising way to reduce static leakage is by intentionally turning off a series-connected transistor. In general, sub-threshold leakage current can be reduced when more than one transistor in the stack is turned off. This is known as the stacking effect [18]. Furthermore, according to the study in [19], the leakage of a twotransistor stack is an order of magnitude less than the leakage in a single transistor. B.Multiple Vth and Dynamic Vth As the sub-threshold leakage has an exponential dependence upon the threshold voltage, multiple threshold voltages can be provided in a single chip for proper use to reduce the leakage current. In general, higher threshold transistors can suppress the leakage while the lower threshold transistors can provide higher performance. There are various ways to achieve the varied threshold voltage. For example, changing the channel doping, gate oxide thickness, channel length, and body bias can all affect the final threshold voltage of a transistor. Thus, we can change the Vth either statically or dynamically. C.Low-swing Signaling As is discussed in the above, a straight-forward method to achieve dynamic power reduction is to reduce the signal swing. As known, low-swing technology provides high speed and low power at the same time. Instead of driving signals rail-to-rail, special drivers allow reduced signal swing. This may directly result in linearly reduced dynamic power, as expressed by the above equation. At the same time, the time needed to charge or discharge a node is also reduced, enabling faster state switching. In the following, we will present some research work using this technique to reduce dynamic power dissipation of microprocessors. T. Sakurai et al. (1997) In 1997, T. Sakurai et al. described some circuit level techniques for low-power CMOS designs. In particular, the authors discussed the low swing signaling technique, and presented its applications to a clock system, logic part, and I/Os. They concluded that the low swing signaling technique is useful to reduce dynamic power dissipation. VI.ARCHITECTURE OPTIMIZATION As seen in equation 2.5, gate delays increase drastically, when supply voltage approaches the threshold voltage of the MOS transistor. There are two architectural techniques

RITS ICAEM-2012: RITS-International Conference On Advancements In Engineering & Management

28 & 29 February 2012

that can improve the speed of the circuit under reduced supply voltage. A.Pipelining It is a powerful transformation of the data path to reduce the critical path of the system and improve the speed. It involves the insertion of delay elements/ flip flops at specific points of a data flow graph of an algorithm/architecture. The speed gained by this transformation can be traded for low power by voltage scaling. B. Parallelism It is similar to pipelining in that it exploits parallelism in a system, however here this is achieved by duplicating hardware in order to perform a number of similar tasks concurrently. The authors of [1] show the advantages of both approaches through an adder comparator example. The original design consists of an adder followed by a comparator with equal circuit delays. There are registers at the input of the adder and the comparator. The pipelined version is created by inserting registers in between the adder and comparator. The supply voltage could be scaled down as the pipeline register allows the delays to increase by a factor of two. This is due to the equal circuit delay assumption for both the adder and the comparator. The parallel version is created by using a pair of adder-comparator structures. Each adder-comparator unit runs two times slower than the original design. By overlapping the operation of each adder-comparator unit, this version selects the available output from the finished adder-comparator unit via a multiplexer. This parallel version still communicates data with the external world using the original clock rate even though the individual units work slower. This speed gain can be traded for low power by scaling the supply voltage. The gains for both approaches in terms of power consumption are similar. However pipelining has a smaller area overhead compared to hardware duplication. One could of course combine both approaches to gain even more improvements in speed. VII.SYTEM LEVEL TECHNIQUES Even higher level low power techniques are proposed by researchers to further reduce static power dissipation. The nature of static power dissipation indicates that it is independent of switching activities and is static all the time. Thus, if the total time needed by a specific job can be considerably reduced, the amount of static energy can also be saved. There are some techniques which attempted to reduce static power dissipation at system level, as is illustrated in the following. A.Pipelining Pipelining saves energy in a straightforward way. When using pipelining, it significantly reduces the overall execution time of a certain program. As a result, the

time of leakage flowing is also reduced, thereby leading to a reduction in leakage-induced static power dissipation. Therefore, although pipelining usually is used for improving the performance of processors, it also is an effective method to reduce static energy consumption. B.Phase Switching In general, modern day microprocessors are designed for the best performance. However, such best performance is not always needed in most applications. If certain periods of an application can be identified as standby or dormant, many circuit level techniques can be applied to significantly reduce the leakage power. Then, identifying such phases in applications is a system level effort toward low power design. Some examples by using phase switching to reduce static power dissipation are presented in the following. M. Powell et al. (2000) In [21], the authors found that there is a large variability in active cell usage both within and across applications. Thus, by using Gated-VDD Caches, they proposed to identify phases with unused SRAM cells and gate their supply voltage and reduce their leakage. Their results indicated that it highly reduces leakage-induced power dissipation with minimal impact on performance. C.Signal Swing As is well-known, in CMOS circuits, frequent signal switching consequently produce much voltage swing. Therefore, some researchers suggested reducing the signal swing to minimize the overall voltage swing.S. Haga et al. (2003) Signal switching usually happens at the input port, output port and inside of the functional unit (FU). In [60], S. Haga et al. proposed a hardware method for functional unit assignment, based on the principle that a functional units power dissipation is approximated by the switching activity of its inputs. It dynamically assigned instructions to carefully selected Functional Units to minimize signal switching that happens in the FU. In their design, instructions are preferably issued to FU where the previous operands are similar to the current operands.. D.State Switching Scaling the supply voltage can considerably reduce the dynamic voltage at the price of slower execution speed. Thus, the best trade-off between power and performance can be achieved by switching between a spectrum of active and standby states. Therefore, some designs are suggested to reduce dynamic power dissipation by carefully switching states during system execution. VIII. ALGORITHM Choosing the algorithm to implement the application at hand represent the most important decision in meeting the power constraints. From the previous section, we can deduce that in order to reap the greatest architectural gains, the ability to parallelize an algorithm will be critical, and the basic computation must be optimized, as the basic theme in low power design is voltage reduction.

RITS ICAEM-2012: RITS-International Conference On Advancements In Engineering & Management

28 & 29 February 2012

Therefore, at the algorithmic level, transformations that can be used to increase speed and allow lower voltages are useful. Often these approaches translate into larger silicon area; hence the approach has been termed trading area for power At the algorithmic level, optimizations that reduce memory access frequency (exploitation of temporal locality [12]), and HW/SW partitioning of a system based on minimizing memory requirements are important aspects of design that effect memory and hence overall system power consumption [2]. IX. CONCLUSIONS Present-day technologies possess computing capabilities that enable the design of powerful work stations, sophisticated computer graphics, and multi-media applications such as real-time audio and video signal processing. Furthermore, users of these applications have the desire to access this computation at any location. Thus, the requirement of portability has put severe restrictions on size, speed and power consumption. Improvements in battery technology are being made, but it is highly unlikely that a dramatic solution to power is forthcoming. Interest in low power has urged the researchers to look at the problem from the designers point of view. Techniques at various levels of design abstraction are being investigated. This paper introduced the source of the problem and presented some of the techniques involved. REFERENCES [1] A.P. Chandrakasan, S. Sheng, and R.W. Brodersen. Low-power CMOS digital design. IEEE Journal of Solid-State Circuits, 27(4):473484, April 1992. [2] K. Danckaert, K. Masselos, F. Cathoor, H.J. De Man, and C. Goutis. Strategy for power- efficient design of parallel systems. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, volume 7, pages 258 265, June 1999. [3] F.Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, and H. De Man. Global communication and memory optimizing transformations for low power signal processing systems. In 1994 Workshop on VLSI Signal Processing, VII., pages 178 187, 1994. [4] H. J. M. Veendrick. Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on the Design of Buffer Circuits. IEEE Journal of Solid-State Circuits, pages 468473, August 1984. [5] J. Spars and S. Furber, editors. Principles of asynchronous circuit design - A systems perspective. Kluwer Academic Publishers, 2001. [6] K. Yano, T. Yamanaka, T. Nishida, M. Saito, K. Shimohigashi, and A. Shimizu. A 3.8-ns CMOS 16x16-b Multiplier Using Complementary Passtransistor

logic . IEEE Journal of Solid-State Circuits, 25(2):388 395, April 1990. [7] L. Wei, Z. Chen, K. Roy, M.C. Johnson, Y. Ye, and V.K. De. Design and optimization of dual-threshold circuits for low-voltage low-power applications In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, volume 7, pages 1624, March 1999. [8] R. Mehra, D.B. Lidsky, A. Abnous, P.E. Landman, and J.M. Rabaey. Low power design methodologies, chapter Algorithm and Architectural Level Methodologies for low power. Kluwer Academic Publishers, 1996. [9] N. H. E. Weste and K. Eshraghian. Principles Of CMOS VLSI Design, A systems perspective. AddisonWesley, Second edition, 1993. [10] Y. Taur. CMOS scaling and issues in sub-0.25 um systems, in Design of High-Performance Microprocessor Circuits. A. Chandrakasan, W.J. Bowhill, and F. Fox, Eds. Piscataway, NJ: IEEE, 2001, CH. 2, pp. 27-45. [11] S. Thompson, P. Packan, and M. Bohr. MOS scaling: Transistor challenges for the 21st century. Intel Technology Journal, Q3, 1998. [12] D. Fotty. MOSFET Modeling with SPICE. Englewood Cliffs, NJ: PrenticeHall, 1997, CH. 11, pp. 396-397. [13] T. Iwamoto et al. A highly manufacturable low power and high speed HfSiO CMOS FET with dual polySi gate electrodes. In Digest of Technical Papers of International Electron Devices Meeting (IEDM), 2003, pp.27.5.1-27.5.4. [14] S. Thompson, P. Packan, and M. Bohr. Linear versus saturated drive current: Tradeoffs in super steep retrograde well engineering. In Digest of Technical Papers of Symposium on VLSI Technology, 1996, pp. 154-155. [15] S. Venkatesan, J.W. Lutze, C. Lage, and W. J. Taylor. Device drive current degradation observed with retrograde channel profiles. In Proceedings of International Electron Devices Meeting, 1995, pp. 419422. [16] J. Jacobs and D. Antoniadis. Channel profile engineering for MOSFETs with 100 nm channel lengths. IEEE Transaction of Electron Devices, vol. 42, pp. 870875, May 1995. [17] W. Yeh and J. Chou. Optimum halo structure for sub-0.1 m CMOSFETs. IEEE Transaction of Electron Devices, vol. 48, pp. 2357-2362, Oct. 2001. [18] V. De, Y. Ye, A. Keshavarzi, S. Narendra, J. Kao, D. Somasekhar, R. Nair, and S. Borkar. Techniques for leakage power reduction, in Design of High-Performance Microprocessor Circuits. NJ: IEEE, 2001, CH.3, pp. 4852.

You might also like