IEEE Paper Improved Data Center Efficiency and Reliability

Review of the impact of improved efficiency on reliability INTRODUCTION There are two important design concepts that should
be in the forefront of every data center professionals mind; one is energy efficiency and the other reliability. The current and future energy market does not have room for inefficient data centers and has put a spot light on ideas like EPA Energy Star and USGBC LEED certified data centers. The NYT Technology sections continue to emboss the idea in our mind by publishing articles such as their June 17th 2008 installment titled, Demand for Data Put Engineers in the Spotlight, which comments on the need that we have for energy efficient data centers. A key concept that has historically been a higher priority than efficiency for data centers is reliability. Without reliability the data center cannot perform its intended function. This paper will discuss some of the ways in which efforts to improve data center energy efficiency can impact operational reliability. The discussion will be primarily qualitative in nature with examples to provide illustration of key ideas. Examples of efficiency improvement efforts that have either adverse or beneficial effects on reliability will be discussed. We will start by briefly reviewing the electrical and mechanical service requirements for IT equipment as defined by the Information Technology Industry Council (ITIC) and the American Society of Heating, Refrigeration, and Air Conditioning Engineers (ASHRAE). With these core concepts established, we will also briefly review efficiency with emphasis on the concept of efficient operation as the delivery of energy where it is intended to be used. The remaining portion of delivered energy (losses) is converted into heat. Equipment environment - Class requirements Class 1 2 3 4 Air conditioning Yes Yes Yes No Environmental control tight Loose No No Example Servers and storage products Server and storage products (outside of data center) Workstations,PCs and printers Point of sales equipment
Table 1: ASHRAE 2006 Class Requirements ASHRAE provides, among many other things, the guidelines for acceptable data center operating temperature and humidity conditions. These vary depending on end use and are summarized in Table 1 above.
This paper will primarily focus only on Class 1 equipment which requires the largest quantity of cooling as well as has the tightest environmental parameters. The current published ASHRAE states allowable and recommended cooling class one environmental conditions.
Equipment environment specifications Class 1 Allowable dry Recommended Dry Allowable % bulb(F) Bulb (F) relative humidity 59 to 90 68 to 77 20 - 80 Recommended % relative humidity 40 - 55
Table 2: ASHRAE 2006 Temperature and Humidity for Class 1 Recent recommendations of the ASHRAE committee have raised the recommended dry bulb temperature to 80 degrees. This reduces the demand for conditioning equipment to supply colder air and also allows for a larger delta T for supply and return conditions of the heat removal fluid (water or refrigerant). Similar standards for the provided power environment are provided by the Information Technology Industry Council (ITIC) as shown below:
Reliability is a key concern for proper data center operation and the IEEE Gold Book, Standard 493-2007 Recommended Practice for the Design of Reliable Industrial and Commercial Power Systems provides the method and data needed to evaluate the reliability and availability of the electrical and mechanical infrastructures. The methodology of the IEEE Gold Book should be used to
determine how energy efficiency improvements may affect the reliability of a data center. The overall success of the data center in operation is dependent on the five following factors: 1.) The location of the facility and the inherent risks associated with it. 2.) The design of the facility and how well it meets the requirements for the intended use. 3.) The overall quality of the construction of the facility. 4.) The operation of the facility including the quality its documentation, procedures, and staff training. 5.) How well the facility and its equipment are properly maintained. The most basic method of improving reliability is to provide redundancy in order to eliminate single points of failure (SPOF). A single point of failure is a point in the electrical or mechanical infrastructure in which a single piece of equipment failing would result in a loss of critical load for the facility.
Figure1: N + 1 UPS modules, generators and chillers supplying IT loads
In Figure 1 above, there is a redundant UPS module, generator and chiller. In each case the critical load only requires one to be in operation. Assume the IT equipment is dual corded (IT equipment with two power cords in which only one is needed for it to operate), with each cord powered by a different Power Distribution Unit (PDU). (Half the cords are connected to PDU 1 and the other side of each to PDU 2.) There are single points of failure at Automatic Transfer Switch ATS 1, Input Switchboard MS2 and UPS Bypass Switchboard SBB. If any one of these failed, the power would be lost to the critical load. In the case of ATS 1 or Input Switchboard MS2 failing, the power would only be lost after the UPS batteries ran out of power, but since the batteries are only designed to carry the load during the transition from utility power to generators, this would still be considered a SPOF. Increasing system efficiency in some cases can degrade the reliability of a facility. In Figure 1 above both UPS modules would normally be in operation all of the time, so that if one module failed the other would carry the complete load. This is not the most efficient way to operate. It is more efficient to have just one UPS module carrying the entire load. In this case, improving the reliability by operating only one UPS module would directly and significantly reduce the system reliability.
Figure 2: 2N UPS modules supplying IT loads Eliminating the SPOF in Figure 1 will greatly improve reliability. Assume the dual corded IT loads are connected between the two UPS systems (PDU 1A supplies one cord and PDU 2A supplies the other). The SPOFs have been eliminated, since there are now two complete paths for the power to the IT equipment. Buildings using new or unproven technology or equipment that introduces additional complexity to a system may also create additional risk to the facility. The additional complexity in a system when extra components are added increases the total number of points of failure and may add SPOF. A point of failure is a segment, part or piece of equipment in one of critical system that could fail and cause the subsystem to fail. As covered above, if that failure causes the loss of critical load it would be considered a SPOF. If not, at least one other failure would be needed for the loss of critical load. With the exception of adding redundancy (by adding more spare equipment or multiple paths of power), the more points of failure a system has the less reliable it will be. A geothermal field is a good energy efficient substitute for a cooling tower but because of the great increase in potential points of failure between the cooling and the geothermal field there can be a decrease in reliability. The increase in the points of failure can also create the potential for single points of failure which may not be obvious. The designer needs to consider all modes of failure and make plans on how to mitigate such failures as they occur. In a commercial building, geothermal field wells can range from 400-1000 feet deep. When a well fails, all of the wells on that particular branch typically need to be turned off because they are not repairable. The designer must be aware of this possibility and provide enough redundancy to provide for the building load. In some cases the designer will opt to provide supplemental heat dissipation by providing a cooling tower to run in conjunction or as a back up for the geothermal well field. The cooling tower could decrease the energy efficiency and at that same time degrade the reliability, since the system has become more complicated. For most data center facilities, including a cooling tower as a back up would be good design practice because geothermal design is not an exact science and the capacity of a well field can change over time. In some geothermal applications it can be necessary for the cooling tower to be used at night to cool the geothermal area in order to maintain the load during peak summer temperatures. Another disadvantage of the geothermal field could be that the staff may not be as familiar with it because it is not as common as the traditional cooling tower / chiller design.
This is an example where a more energy efficient solution may or may not impact the system reliability and detailed analysis of the specific designs under consideration as well as the facility location and staff competencies is required to make informed decisions about the relative risks involved. SECTIION 4 Green design can also positively affect the reliability of a data center facility. Green design technology is not limited to system components but can also be a result of an effort to tailor the building design to take advantage of the local environment. System components that improve efficiency are VFDs, energy recovery wheels, as well as water and air side economizers. The addition of VFDs has been a key component in retrofitting existing buildings and designing green field sites to run more efficiently. They allow for a wide range of flexibility because the motor can run at the optimum speed for the building load instead of a fixed point determined by the design or commissioning process. VFDs are available for a wide range of motors used for both air and water side applications. An example of an increased energy efficiency and improved reliability with a VFD could be a set of four pumps with three running in parallel. For the system without VFDs, the loss of a pump may cause a chiller to shut down until the standby pump starts. In a comparable system with VFDs, the chiller would probably not shut down, since the remaining three VFDs would just have to ramp up to maintain the proper pressures and flows. System A (w/o VFD) 3 pumps at 100% run at 120 gpm with a total load of 360 gpm with the 4th pump standby. System B (w/ VFD) 4 pumps at 75% run at 120 gpm with no pump standby and still supporting the 360 gpm load. Each pump has its own VFD.
In addition to saving energy, the application of VFDs on large motors can also reduce operating expenses by starting motors at slow speed and then ramping them up to their setpoint speed. This operation results in drastically reduced inrush current to the motor and reducing peak demand charges with many energy suppliers. When the data center is in a location where there is a long period of colder weather, the use of a water side economizer can be an efficient way to reduce the amount of cooling costs. A water side economizer or heat exchanger reduces the energy demand by a chiller plant and at the same time provides an
increase in reliability. A heat exchanger set up in a series capable configuration allows for the greatest potential energy savings and flexibility. In this configuration there are three operating options. Option 1 is that the water could run directly through the chiller only. Option 2 is that the HEAT EXCHANGER water could run through the heat exchanger only. Finally option 3 CHILLER would be a hybrid system of water running first through the heat exchanger and then the chiller. The CONTROL two modes of further energy savings VALVE would be option 2 and 3. In addition PUMP to the energy savings modes the addition of the heat exchange while adding other points of failure still increases the reliability of the system CHILLED WATER RETURN by adding multiple modes of producing chilled water for the data center space. In addition running the chiller at a lower speed when possible will result in less wear and tear on the equipment and extend the life. All of which will result in a more reliable infrastructure to support the critical load of the data center.
CHILLED WATER SUPPLY
SECTION 5 The fifth section of the paper will discuss system monitoring and automation. This topic will investigate how data collection supports verification of energy delivery and system redundancy. The electrical portion of this discussion will review how monitoring of peak system usage allows verification of system redundancy by mathematically evaluating the loads that will be present on components and sub-systems in the event of a failure. Performing this review on a periodic or automated basis ensures that the designed reserve capacity is maintained in systems such that when a component or sub-system fails, the remaining infrastructure is able to support the critical and essential loads. Mechanical system monitoring can confirm redundancy similarly in the way it confirms electrical efficiency. Mechanical design engineers cannot confirm actual site conditions until the site is built, but with collection of data they are able to more readily predict how a system will behave. Mechanical systems are designed with reserve capacity for peak loads and collected data can confirm redundancy at those peak loads. The collection of system operational data over a wide range of operation can help to refine future mechanical designs. Even with the most up to date energy
modeling software, the mechanical designer will need to build in a significant level of redundancy. In addition to the limitation of the energy design software there are also inherent limitations when it comes to the exact knowledge of how each specified piece of equipment will actually perform in a specific application. Proper design results in each piece of equipment having a slight margin in total capacity. There are also unpredictable environmental and operating conditions that need to be considered. The design should function most efficiently in the range of operating conditions it will normally experience and still be able to handle the worst cases it may encounter. Once a new data center is built, data collection can confirm whether or not the reliability and availability of the design has been achieved. For a developer this may be an important selling point for his customers. The design engineer can use a number of tools to predict what the reliability and availability should be. But until the facility has been operational for a significant period of time, the design, installation and operation has not been confirmed. Traditionally, there are simple controls that provide data monitoring, but not overall system optimization. A thermostat controls when a CRAC unit stages on and off its compressors. A pressure sensor controls the speed at which a set of pumps operate. A temperature probe provides a logic controller with the information needed to engage the next cooling tower when additional cooling is needed. The set points used are often determined during commissioning to optimize the system at a specific level. A new advanced level of control that is now available for data centers is Hewlett Packards Dynamic Smart Cooling. Dynamic Smart Cooling will adapt to provide a more energy efficient means of cooling a specific data center over a wide range of IT load operation. A great deal of energy is wasted providing cooling where it may not be necessarily needed. The most efficient way to run a chilled water central plant is to create the largest supply and return temperature differential (delta T) possible. A larger delta T will also reduce the amount of air that needs to be circulated which then results in a decrease in fan power. Dynamic Smart Cooling does this by controlling where cold air is directed. For most designs, cooling units are distributed evenly throughout the room. In some cases, increased supplemental cooling may be provided in one part of the data center. Unfortunately data center layouts are not always stagnant; blade servers can easily be changed out and replaced from time to time and shift the load throughout the room. This is one of the many reasons why Dynamic Smart Cooling is a great tool for the modern data center. Even though Dynamic Smart Cooling definitely adds another level of complexity to a facility the end result
would be an increase in reliability especially when it comes to making sure that all end users components receive the necessary cooling. In addition the system provides a great deal of energy savings which in some facilities can result in a 25-40% reduction in cooling costs. CONCLUSION In the past, reliability has been the primary driver of data center design. Energy efficiency was either a distant second or not even an issue since the cost savings available were not considered to be significant as compared to the cost of business disruption no matter how small the additional risk. As energy costs have soared and environmental responsibility has become a marketable business attribute, the focus has shifted to a more balanced approach. Through careful and thoughtful evaluations of data center design approaches, efficiency can be improved while the impact to system reliability is balanced. Therefore a good data center consultant can support evaluation of both data center and business models to optimize data center design for the clients needs. The final result is varying levels of improved efficiency with the final solution developed being optimized that businesss competing goals of efficiency, reliability, first cost, operating cost, costs associated with risk, and the total cost of ownership.

IEEE Paper Improved Data Center Efficiency and Reliability

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE Paper Improved Data Center Efficiency and Reliability

Uploaded by

Copyright:

Available Formats

Review of the impact of improved efficiency on reliability INTRODUCTION There are two important design concepts that should

Figure1: N + 1 UPS modules, generators and chillers supplying IT loads

You might also like