You are on page 1of 40

Capacity Planning Issues - a dynamic situation : Case Study

Industry-FTT Data Center, Mumbai

L.N. Welingkars Institute of Management, Development & Research


Year of Submission: September, 2013

You cannot afford to assume that the data center has unlimited capacity; this is even more true for the internal cloud
Capacity management ensures IT capacity cost effectively meets business requirements. A capacity management process will reduce infrastructure waste while providing a framework for future acquisitions planning and accurate cost accounting. This research is designed for:
CIOs or IT directors IT infrastructure / data center managers Internal utility infrastructure / cloud evangelists

This research will provide you with:


An understanding of why the lost art of capacity management is more critical than ever in consolidated proto-cloud infrastructures. A process and workbook for cataloging and assessing current capacity in light of the needs of the business.

.36 * .21 * .09 .15


Service Tiers Capacity Mgmt Cost Accounting Capacity Planning

A process checklist for capacity management with links to relevant additional resources and tools at Info-Tech.
A gas gauge model for capacity planning based on reserve capacity and maintenance of service.
2

Capacity management and service tiers have a positive correlation with consolidation success.

Executive Summary Understand


Capacity management is a critical step between simple server consolidation/virtualization and creating the internal infrastructure-as-a-service cloud that enterprises are currently focused on building. In an internal private cloud the organization pays for everything. Unlike an external public cloud, where capacity is open-ended, the organization has to pay for total capacity -- not just capacity that is being used right now. Virtualization does not create capacity or make capacity less expensive. Server virtualization is an important enabler of internal and external cloud computing, but it alone does not make a cost effective cloud service.

Assess
Turn data center management outside-in. Cloud computing is associated with delivering IT as a service. Assessing the infrastructure for capacity management and planning starts with the business and ends with a model for total cost to serve and capacity management across service tiers. Use Info-Techs Capacity Planning Data Collection & Tiering Workbook to assess infrastructure based on dependencies, interdependencies, criticality, and business priority.

Plan/Prepare

Manage capacity by service tiers for cost efficiency. Not all services require the same capacity. Examine variable capacity costs for each tier to see how savings might be realized without compromising service levels. Take a gas gauge approach to capacity planning. Once pools of reserve capacity are established, future capacity acquisitions are based on service maintenance rather than application addition. Capacity management is a process, not a product. Look to system management and internal cloud management tools with an eye to how they might automate your capacity management practice.
3

Section in Brief Understand 1


This section will help you: Understand why capacity management is a critical activity between consolidation and internal cloud. Put virtualization in its proper place as a tactical enabler, rather than a management strategy. See how capacity management prepares the infrastructure for a cloudy future, and aids in ongoing consolidation and virtualization.

Assess

Plan/Prepare

Focus on capacity management to optimize cost effectiveness & service, both now and for an increasingly cloudy future
External public clouds will play a role in the future of corporate IT, but right now most IT departments are focusing on developing the internal cloud.
An internal cloud is
infrastructure-as-a-service (IaaS) delivered from internal IT resources. Consolidation and virtualization play a role in building an internal cloud, just as they do in external IaaS in a public cloud service (e.g. Amazon Web Services).
Focus on the internal cloud before external Implementing only internal cloud solutions Focus on the external cloud before internal

12% 12% 43%

76 % focusing on internal cloud

Capacity management is important because


virtualization does not create capacity, nor does it automatically make all capacity cost-measurable and cost-effective. A capacity management strategy will enable a move from infrastructure as asset management to infrastructure as service management.

33%

Implementing only external cloud solutions

Benefits will include:


The The

Most IT departments engaged in consolidation and virtualization are focused on internal cloud development first. A third (33%) will focus only on the internal cloud. Interest in the external cloud remains strong but implementation is in early days. Most are looking to the external clouds role becoming more important 35 years from now.
External cloud in three to five years.

capability to document current capacity. ability to plan capacity in advance.


N= 123

The ability to estimate the impact of new apps and modifications. savings through elimination of over provisioning capacity, and through planned spending rather than reactive spending. and spending optimized to match business needs.

Cost

Service

It is recommended for capacity management (infrastructure analysis & planning) to optimize service tiers
Correlation with Success in Consolidation/Virtualization Projects Success was defined as: .36 * .21 * .09 .15 Reduced capital and facility costs. Reduced man-hours spent on management of infrastructure. Increased uptime and business continuity. Reduced virtual server sprawl. Reduced security concerns.

Service Tiers Capacity Management Cost Accounting Capacity Planning

Note: * = correlation is significant. N = 88. Source:

Capacity management practices lead to greater success in infrastructure consolidation/virtualization projects. Having developed service tiers in infrastructure was the strongest predictor of overall success in consolidation. A capacity management process, such as inventorying resources annually, was also a predictor of success, especially in managing virtual server sprawl, security assurance, and business continuity. Cost accounting and capacity planning were not predictors of current success. However, as we shall see, efficient capacity planning and cost accounting are not direct inputs, but outcomes of capacity management.
6

Where this solution set fits: Capacity management is a critical part of the larger picture of building the internal cloud
This set is one of a series dedicated to building converged utility infrastructure (see below right). All these sets reference InfoTechs layer cake model of consolidation (right) and our three laws of utility/cloud investment (below left). A management process that starts with business needs, works through capacity optimization, and ends with a plan for tiered service pooling adheres with InfoTechs three laws because it relates capacity management directly with servicing the needs of the business. Info-Techs layer cake model for the internal cloud shows how infrastructure layers and virtualization all contribute to service, but an additional element is efficient management of capacity across the layers.

Three Laws of Cloud Infrastructure Investment 1 Alignment is Software

How do you slice this cake?

2 Hardware is Capacity

Related sets that address aspects of building an internal cloud


Build an Optimized Infrastructure-as-aService Internal Cloud Mitigate Costs & Maximize Value with a Consolidated Network Storage Strategy Evaluate a Backup Architecture Strategy Build a Server Acquisition Strategy for the Internal Cloud

3 Management is a Differentiator
For more detail on the three laws and how they relate to a capacity planning process see slide 18.

Craft a Converged Data Center Network Strategy

Select a Consolidated Storage Platform

Compare & contrast the clouds for the internal cloud, your enterprise pays for everything and shoulders all the risk
External Public Compute Cloud

VS.

Internal Private Compute Cloud

Similarities

The internal and external clouds are both abstracted environments where applications are provisioned with available and scalable compute capacity. Abstracted compute resources (processor cycles, memory, storage) are typically derived from aggregated and virtualized hardware. Compute resources are presented to the customer as a service. Both models are highly agile and responsive to changing business demands. Infrastructure is entirely owned by the enterprise and managed by IT. Application workloads are provisioned by resources that can also be elastic, but scaling is limited by available capacity. The business the sole customer of internal IT infrastructure pays for the whole cloud regardless of how much is used.

Infrastructure is owned by an external third party. They are responsible for managing capacity and mitigating risk. Application workloads are provisioned by these abstracted resources which are elastic (they scale up with need). Customers share access to these resources (typically via the Internet) in a multi-tenant environment and pay only for what they use.

Key Differences

It hurts to be alone total ownership of limited capacity imposes an expensive box that can be invisible to the business
Unused capacity costs are ongoing overhead for the internal cloud.

In an internal Infrastructure as a Service cloud, the enterprise pays for all capacity, not just a share of a larger third-party pool. Justifying the IT spend for total capacity is difficult when the business is used to a 1 to 1 relationship between an application and a hardware purchase.

Risk mitigation is a significant component of total cost.

In the external cloud, the third party provider is responsible for risk mitigation of the capacity it rents (availability, recoverability, security). In the internal cloud, IT bears this responsibility. Significant cost drivers are the hardware and data redundancy that are needed to mitigate risk.

When the capacity limits are reached, physical infrastructure needs to be acquired ad hoc.
The public cloud is open ended. The third-party provider maintains a practically unlimited pool of capacity that is available on demand. In the private cloud, capacity is limited.

Concern about hitting the wall of internal capacity limits leads to over provisioning. Acquiring more capacity than is needed means wasted spending and maintenance time.

Draw a clear line from business need through software & hardware needs transparency is not the same as invisibility
The goal of capacity management is to optimize performance and efficiency of the current infrastructure, to plan for future capacity requirements, and to justify the financial investment in the infrastructure. The classic steps in capacity management are:

Capacity Management vs. Planning


One Leads to the Other
Capacity management is a tactical activity focused on the present. It enables cost effective provisioning of IT services by helping organizations match their IT resources to business demands.

Analyze current capacity find out how apps are currently provisioned and what the performance and availability requirements are for each one. Optimize the infrastructure to ensure the most efficient use of existing capacity. Analyze the impact of new or updated apps on capacity. Analyze demand to model service requirements of the infrastructure and predict future growth in demand. Develop a capacity plan that relates future growth in capacity to maintenance of service levels.

Capacity planning is a strategic activity focused on the future. It is the process determining the amount of hardware resources that will be required to deliver the appropriate level of service for the defined workload at the least cost.

These steps will guide our recommendations in section 3 of this report.

10

Rediscover the lost art of capacity management & planning after decades of inefficient distributed processing That was then This is now

Capacity management in IT matured in the mainframe environment, where resources were costly and it took considerable time to upgrade. Applications needed to be provisioned from a share of the centrally maintained and expensive compute resource. Resource partitions needed to be rigidly cost justified and cost managed because of the high cost of the total capacity. Expanding capacity in this environment was expensive and time consuming.

As data centers transitioned to a distributed environment supported by inexpensive UNIX, Linux and Windows servers, a brute force approach to provisioning became the norm. Cheap industry standard servers could be assigned to provision specific new or expanding applications or services. Capacity management and planning skills atrophied in companies accustomed to this throw some more hardware at it approach. Unregulated distributing processing bred increased complexity in unregulated server sprawl, and waste in poorly utilized silos of processing and storage.
11

Server virtualization does not equal cloud the internal cloud is the end of a journey that begins with server CAPEX savings
Server virtualization mitigates waste of distributed servers through better resource utilization and process agility, but virtualization is an enabling tactic, not an infrastructure model.
100% Utility Infrastructure (Internal Cloud)

Organizations typically embark on server virtualization to realize immediate capital savings from reduced server hardware footprint, through consolidation. However, as more of the server infrastructure is virtualized, further benefits such as improvements in provisioning agility and service availability begin to emerge.

Percent Virtualized

Management
Application Lifecycle Provisioning Performance Monitoring Automation Metering (Chargeback)

Consolidate

Load Balancing Availability

A managed internal cloud is the end of this journey that begins with a simple need to save money on server acquisition.
To realize these benefits, management capability of both the underlying capacity as well as the virtualized abstraction layer is critical.

P2V Candidate Identification

Recovery

0%

Time (relative to size and hardware refresh rate)

The internal cloud is not a product that will be delivered out of a box. It will be developed over time, enabled by consolidation, standardization, virtualization, and capacity management that focuses on service delivery to the business.
12

Wrap up consolidation efforts and focus on capacity management for the entire infrastructure
Saving time and money on servers only increases as consolidation progresses. However, other layers of the infrastructure do not see the same success. Similarly, management benefits are mainly in server instances.
< 50% > 50%

What this means


Server CAPEX reduction is the greatest benefit of consolidation through virtualization. Virtualization does not lead directly to savings in facility, storage, or network costs.

Servers Virtualized

Organizations that were more than 50% virtualized generally agreed that all types of management took fewer man-hours due to consolidation.
However, increased virtualization had the biggest impact on server management. Organizations that were more virtualized spent significantly fewer man-hours on server instance management. Careful management planning for the entire data center will optimize facility costs, storage costs, network costs, and management complexity.

Server

Facility

Storage

Network

Server Physical Infrastructure Instances Infrastructure Costs Agreement with the number of hours spent has been reduced Averages above the dotted line indicate agreement that man-hours have been reduced. The difference between low and high virtualization is only significant for server instances.

Agreement with costs have been reduced by consolidation Averages above the dotted line indicate agreement that costs have been reduced. The difference between low and high virtualization is only significant for server cost reduction.

13

Start capacity management now to optimize current infrastructure and boost success in ongoing consolidation
2011 is the year that most companies doing consolidation will cross the line to having more than 50% of their infrastructure virtualized. Many have already crossed that line.

What this means


On the journey from tactical server consolidation to internal cloud management, enterprises are at a point where management is going to matter more than infrastructure effectiveness. With a majority of workloads virtualized, virtual infrastructure is increasingly core infrastructure. Enterprises have likely moved beyond the low hanging fruit of server consolidation (such as test, dev, and non-critical servers) to virtualizing more mission critical and resource demanding workloads. However, a significant proportion of the workloads will remain un-virtualized for immediate future. Treating infrastructure as a service management model will need to account for all server workloads. Capacity management correlates with consolidation and virtualization success. In addition to orienting toward IT as a service, capacity management will help deal with an increasingly virtualized consolidated infrastructure.

Current & Projected Virtualization


100 90 80 70 60 50 40 30 20 10 0 % Virt Now % Virt 18 Months % Virt 3 Years

Server Virtualization by Company Size


52%
50% Large (1000+) Medium (250-1000) Small (0-250)

42%

14

Avoid virtual server sprawl & boost success in areas such as business continuity & security with capacity management
Without an idea of the cost and appropriate provisioning of capacity, the benefits from reducing the complexity of physical server management is eradicated by virtual sprawl.
Significant correlation with capacity management

.28 .24 .24


Virtual server sprawl happens when the business loses sight of infrastructure requirements and costs of running a virtual machine. Fast and easy server deployment becomes confused with cheap server deployment. Negative impact of virtual server sprawl includes: Wasted capacity. Resource-consuming virtual machines are running that nobody is using or accountable for. Capacity waste is especially seen in storage, where high end SAN space is being eaten by multiple virtual machine instances. Performance degradation. As more virtual machines are added to the system, the performance of all virtual machines degrades as more workloads contend for the same resources. Unplanned capacity additions. As virtual sprawl increases and available resources decrease, there is demand to add more physical capacity. Having a capacity management plan significantly reduces concerns about virtual sprawl.

.16

.15

.10

N = 88. Source: .

Sprawl is alive and well in our organization. Virtualization has allowed application and business teams to buy additional dev/test/staging environments where they haven't been able to afford them before. They're using the same budgets they had before, they're just buying more servers with them now.

15

Next Section in Brief Understand 1


This section will help you: Turn data center management outside in. Cloud computing is associated with delivering IT as a service. Assessing the infrastructure for capacity management and planning starts with the business and ends with a model for total cost to serve and a capacity management across service tiers. Get started by using Info-Techs Capacity Planning Data Collection & Tiering Workbook. Assess infrastructure based on dependencies, interdependencies, criticality, and business priority.

Assess

Implement

16

Turn infrastructure management outside-in work from business needs through app requirements to total capacity requirements
Think like a service provider rather than an asset manager if you are going to offer infrastructure-as-a-service from a utility infrastructure or internal cloud.
The data center is traditionally seen as a room full of assets servers, networks, and storage arrays, that need to be fed and cared for (with appropriate power, cooling and configuration management). A capacity management view of the data center starts outside, with the service requirements of the customer, then works through all of infrastructure assets needed to deliver expected service levels.

Process Map for a Developing a Capacity Plan

The total cost of application, storage, server, network, and facilities is the total cost of the service being rendered to the business. This is the total cost to serve or the total cost of all capacity.
Finally, a capacity management strategy looks at how total cost of capacity can be mitigated. The key question is how much capacity is good enough to maintain service now and in the immediate future.

17

Understand that hardware is capacity - customer service drives the process but capacity management is about hardware
Some say that IT hardware doesnt matter in a cloud. However, while hardware doesnt matter to the customer as much as the software and service, hardware is a key concern for the service provider.

Three Laws of Cloud Infrastructure Investment & Capacity Management


Alignment is Software
Apps are where strategic and operational goals of the enterprise meet IT. All investments need to be considered first in how they enable the apps that enable the business.

The value of IT to the business comes from how apps and data serve business needs. The business will have priorities as to which apps and data are more or less valuable based on the relative criticality of the business processes they support. IT infrastructure is seen by the business as the capacity to run the apps and store the data. The unit cost of this capacity includes cost per unit of processing or storage but also the additional cost mitigating risk (e.g. ensuring uptime and security requirements). Capacity based on standardized hardware components is not a competitive differentiator. However, automated tools for efficiently allocating capacity to apps, monitoring capacity utilization, and tracking total costs can make one internal cloud more efficient and less expensive than another.

Hardware is Capacity
Apps are provisioned with compute resources derived from underlying hardware, measured in cost per unit of capacity, which includes added value such as guaranteed uptime.

Management is a Differentiator
Efficiently managing the utility infrastructure is a key value add. It can also provide visibility into the infrastructure for compliance and performance monitoring purposes.

18

Determine service level requirements based on business need


Optimal performance requirements + criticality to the business + future growth potential = total service requirements.
What this includes Performance Maximum computing needed
The current capacity requirements (CPUs, network connections, I/0 channels) that the app needs to perform at a level in line with user expectations. Base level performance requirement; what is the worst acceptable response time or throughput?

Where to get it
Original configuration requirements from app deployment. Performance testing and app performance monitoring Physical to virtual machine (P2V) migration planning tools.

+
What this includes Criticality Importance and tolerance for downtime
Current importance of the app or service to the business. Impact to the business of loss of service or poor performance.

Where to get it
Needs assessment for disaster recovery and business continuity planning. Business impact analysis (BIA) for disaster recovery.

+
What this includes Growth Long-term planning
Expected growth in demand from the app or service over the next three years. Accounts for uncertainty.

Where to get it
Predicted growth in business can help predict growth in transactional processes Monitor historical utilization to build a projection of future utilization

19

Calculate the total cost of service by accounting for requirements at each layer of the physical infrastructure
Server Requirements

+
Facilities Requirements

Storage Requirements

+
Total Cost of Service

Network Requirements

+
Service levels not met if demand is more than this

Capacity in Use

Total capacity requirements are what is needed to meet current performance, future need, and availability/recovery goals of all applications and services. Future need is covered by standby capacity ready above what is currently being used by the system. Availability/recovery is typically enabled through redundancy. This redundancy can be:

Good Enough For Performance

Growth

Good Enough for Availability

Standby Capacity Redundant Capacity Criticality Total Required Capacity

Component redundancy (dual power supplies, dual NICs) Full resource redundancy (redundant servers, storage arrays, switches, power supplies, UPS, cooling) Data redundancy (data snapshots, mirrors, backup copies)

Performance

If demand exceeds capacity available for planned growth and/or does not leave enough redundant capacity, service levels will be compromised. Either additional physical capacity will need to be added or another workload will need to be removed from the pool of available capacity.

Total Capacity = Sum of capacity for current performance, future growth, and redundant capacity to meet availability targets

20

Seek balance in provisioning Service is a function of adequate capacity for operation, growth, and redundancy
Scenario one: Need is greater than available capacity Scenario two: Redundant capacity is less than adequate

Capacity in Use

Capacity in Use

Good enough performance

Standby Capacity Redundant Capacity Total Capacity Current Need Future Need

Good enough performance

Standby Capacity Redundant Capacity Total Capacity Current Need

Future Need

Result: Performance is good enough and available enough right now, but when need expands, service will suffer due to inadequate capacity.

Result: Performance is adequate now and in the future; however, lack of redundant capacity threatens availability. SLA availability guarantees will not be met.

21

Case study: City has fully redundant capacity for availability and stays a server ahead for future need
The Situation
Municipal services for a mid-sized northern U.S. city. They have been replacing traditional servers with virtualized infrastructure; they are nearly fully virtualized now. Challenge is to manage adequate capacity to maintain service levels (availability and performance) now and as need ramp up.

Example: How much a server cost in a virtual infrastructure environment

Actions
In communicating service to the organization a server is still the unit of measure, as it is in a distributed hardware environment; however, a server is now not a physical entity but a package of capacity, which is derived from physical infrastructure. The cost of a server (unit of capacity) shown the organization includes the physical resources used by the virtual machine plus the cost of redundant physical resources (see right). This overhead is the cost of guaranteeing maximum availability. The organization seeks to stay a physical server ahead of current capacity requirements to facilitate growth in capacity requirements. The motto is under-commit, over-deliver. Today, we have a lot of snow on the ground, and the servers for our snowplow maps are being overwrought; were having 25000 hits an hour on these sites, so I need to throttle up resources. In a Hyper-V environment, thats a couple clicks. We build that in and hold onto that surplus very tightly, knowing that somebodys going to need it. In planning, I like to be a server ahead. One extra. Its really worth my investment.

Lets say the total physical infrastructure costs $100 000, and can support a maximum of 100 virtual servers ($1,000 per VM). But for redundancy, half of the capacity is reserved. The number of virtual servers actually available for provisioning is 50, at a cost of $2000 per server.
Because the physical redundancy guarantees higher availability in a clustered virtual environment the $2000 server cost includes higher service features like guaranteed uptime. When demand reaches 50, additional virtual servers can be added without new investment, but the reduction in failover capacity will cause performance and criticality to suffer. 22

Establish a systems management team to gather baseline information on current capacity and to develop a capacity plan

Capacity assessment requires the combined input of the professionals who manage each layer of the infrastructure , because all layers of the infrastructure contribute to service levels. Consolidation of infrastructure is not just consolidation of physical boxes and data, but also consolidation of skills and personnel. Establish a systems management team with representation from all the individual technology silos to work together. Train them in advance to use any new tools specific to a consolidated environment. Head the team with a sponsor who has control over every aspect of IT, and who has influence and the HR skills necessary to manage a diverse team.

Were transitioning the staff to a different methodology thats about planning for strategic growth. I sent my staff for training 30 days before we got the first new server in, because the challenges and complications with new tools are pretty huge. But once [the team] has gotten there, the world is wonderful for us. The server guys are saying this is great stuff! because theyre able to very quickly meet demand increasing size, upgrading an app whatever it is, they can meet those demands a lot more efficiently. -- Assistant director of MIS

Document the systems management team in the data collection plan. Use Info-Techs Capacity Planning Data Collection & Tiering Workbook to begin planning. 23

Inventory apps to bring order to capacity coordination chaos


Capacity management can benefit the enterprise regardless of where you are on the consolidation/virtualization curve. Use Info-Techs comprehensive discovery tool to collect data on current allocation of capacity to apps and group apps/capacity by criticality.

An app inventory that provides a clear depiction of the current environment should: Document how each app connects to other apps. Demonstrate dependencies of apps on infrastructure components. Describe criticality of each app and how much down time each can afford. Provide a starting point for analysis and planning for appropriate current and future capacity.

Applications 1 & 2 are dependent on servers 1 & 2 respectively.

Application_1 Infrastructure_1

Application_2 Infrastructure_2
Application_3 Servers can be physical or virtual
Applications 2 & 3 are interdependent.

The Application Inventory and Validation worksheet helps you to gather and organize all pertinent data surrounding each of the organizations applications.
24

Identify dependencies to assess total capacity requirements


Data collected for a complete infrastructure inventory should include: Device name and model Server configuration Storage configuration Storage or server dependencies Application dependencies Network hardware configuration Redundancy reviews

Infrastructure_1 Infrastructure_3 Infrastructure_2

Servers

SAN

The Infrastructure Inventory and Validation worksheet will help you to collect and organize data related to the organization's servers, storage, switches, and routers to further understand relationships. This tab has been populated with an example to help you get started. Be sure to validate all infrastructure inventory to ensure accuracy.
2525

Next Section in Brief Understand 1


This section will help you with:
Manage capacity by service tiers for cost efficiency. Not all services require the same capacity examine variable capacity costs for each tier to see how savings might be realized without compromising service levels. Take a gas gauge approach to capacity planning. Once pools of reserve capacity are established, future capacity acquisitions are based on service maintenance rather than app addition. Capacity management is a process, not a product. Look to system management and internal cloud management tools with an eye to how they might automate your capacity management practice.

Assess

Plan

26

Using your capacity assessment workbook as a starting point, follow the best practice steps of developing a capacity plan
1
Analyze current capacity to determine if
needs are being met and to establish a baseline for planning. Recommendation: Analyze appropriateness of current provisioning to apps and needs.

Optimize the infrastructure to ensure


the most efficient use of existing capacity. Recommendation: Plan to create level-ofservice tiers. Orchestrate optimization across infrastructure layers.

3
Analyze the impact of
new or updated apps on capacity. Recommendation: Follow a P2V policy of virtualize-unless-otherwise to realize agility benefits in provisioning new and updated apps.

4
Analyze demand to model service requirements of the
infrastructure and predict future growth in demand. Recommendation: Forecast future business activity, upcoming new or updated apps, and analyze trends.

5
Develop a capacity plan that relates future
growth in capacity to maintenance of service levels. Recommendation Develop a Reserve Capacity Model that takes a gas gauge approach to maintaining capacity across service tiers.
27

Plan the plan: use a Business Plan & Process Checklist to get buy-in for the process and track results
The Internal Cloud Business Plan Template will help build a business plan for the enterprise as well as document business justifications for any additional projects that are connected to implementations, such as virtualization, shared storage, and network convergence. The goal is to get all the pieces in place for an overall strategy. The resulting document is therefore intended for initial project scoping and for future reuse, as more consolidation strategies are defined.

Use the Capacity Management Process Checklist to track your organizations progress in developing your internal cloud. Additional activities and checkpoints can be added to the checklist, and others removed, to customize it to your situation.

28

Analyze current capacity: compare current provisioning to application & business need

Align your catalog of apps and dependencies with business expectations of performance and criticality.

Compare the provisioning of apps between high, medium, and low criticality groupings in the workbook. Are there significant differences between them? Is there a one size-fits-all approach across apps in servers, networking and storage? If you have internal SLAs, compare service levels of any items referenced in the SLA with actual performance. Are the apps meeting expectations, and are they provisioned adequately to meet expectations? Review usage of various apps and services of CPU, memory, and I/O devices. This analysis will identify high usage resources that may be a problem if demand increases in the future. Record resource utilization and determine major processes consumed by each app. Identify where each workload spends time. Analyze all components of the process chain to determine system resources responsible for the greatest portion of response time for each workload.

Analyze whether current capacity is meeting good enough performance requirements by application.

29

Optimize the infrastructure: plan to create service tiers to optimize your 2 capacity investment
Resist the temptation to treat infrastructure as one-size-fits-all. It has been found that the practice of tiering capacity by service levels significantly impacts consolidation success.
.46

In assessing current capacity, you have seen that not all apps have the same business criticality and performance requirements. In planning infrastructure, look to tiering services by groupings of capacity requirements.
.27

.37 .30 .25 .21

Example of a three-tier service approach to capacity


Bronze
Low OPEX on a costper-unit basis. Provides just good enough levels of reliability for services. Highly agile environment suitable to rapid go to market business strategies still maturing. Example Workloads: Test and Development Short duration processing projects. Fast and cheap deployment

Silver
Higher CAPEX and OPEX than bronze. Adequate levels of reliability for services. Production-level agility, but more rigid than bronze. Example Workloads: Regular production servers/apps New and updated apps brought online in a production environment. Rapid deployment a plus

Gold
Highest CAPEX and OPEX of all tiers. Highest affordable level of reliability. Rigid change control results in lowest degree of agility. Example Workloads Mission critical apps Apps that require a higher degree of capacity bandwidth Infrequent updates with long lead times.

Service tiering was correlated with each of the above measures of success in consolidating infrastructure.

30

Work through infrastructure planning & development efforts to identify opportunities for service tiers
Hardware is capacity. Service is a function of performance and redundancy. Through the systems planning team look for opportunities for service tiering at every level.
Start with consolidated storage: For many service tiers are synonymous with storage tiers Storage can be the most expensive part of a consolidated infrastructure, but it need not be treated as a single monolithic entity. For storage service tiering, look to matching the fastest (and most expensive) disk with the most critical processes and data. Variable redundancy disk, data, and device (including backup) also defines a service tier. Storage virtualization can also boost utilization/lower costs across tiers. See the Solution Set Mitigate Costs & Maximize Value with a Consolidated Network Storage Strategy. In networks variable bandwidth, port and switch redundancy, impact classes of service One way variable storage tiers have been be architected is to have tier one storage use faster Fibre Channel ports and switches while a secondary tier uses Ethernet and iSCSI for storage traffic. Converged networking in 10 gigabit Ethernet holds the promise of reducing network complexity while improving performance of both servers and storage through better I/O and I/O management. In converged I/O variable service becomes a matter of policy rather than hardware. See the Solution Set Craft a Converged Data Center Network Strategy. In servers look at on board redundancy and processing architecture The server is the base unit of capacity in a consolidated infrastructure but server pricing can vary depending on the class of processor, number of processors, and other on board redundancy such as dual power supplies. Form factor advance such as blades also increase density and reduce footprint. See the Solution Set Build a Server Acquisition Strategy for the Internal Cloud. Calculate the impact of tiering on power and cooling and examine redundancy needs within the facilities Facilities are 40% of total cost of the infrastructure. Efficiencies in all the the above layers will have an impact on the load requirements of the data center. Also look for opportunities to vary facilities redundancy for each service tier (see case study below). The Solution Set Renovate the Data Center has significant value even if you are not currently renovating. The set has detailed tools for capturing and optimizing facilities costs including the Power Requirements Calculator and the Standby Power Supply Calculator. 31

Use this tool for a big picture comparison of total costs for each infrastructure layer
Detailed TCO analysis is best left to strategies for each infrastructure layer. However this tool can provide a big picture snapshot of cost comparison across infrastructure layers.

Exploring opportunities to tier services in infrastructure layers will yield total cost savings opportunities. In the following case, for example, a mid-sized professional data services firm estimated potential savings of more than $20,000 per year difference from facilities service tiering alone. Several of the Solution Sets for planning individual infrastructure layers (storage, network, network, facilities) have detailed TCO comparison calculators. For a big picture at-a-glance comparison across layers use the Infrastructure TCO Comparison Tool. Using examples and data from case studies, this tool was developed to illustrates the most common TCO comparisons:
TCO of the existing infrastructure vs. TCO of your proposed project. TCO of multiple proposed projects (e.g. build a new facility vs. co-location).

32

Case study: Application of server tiers produces potential facility & TCO savings for this mid-sized organization
A data services company was planning a renovation of their 100 square foot data center. They explored the idea of tiering their facilities according to criticality, and calculated the cost savings of $22,827 per year in doing so.

Before
Electricity rate: $0.093 Electricity usage per hour: 106 kW Cost per year: $86,415.23 All infrastructure components (e.g. servers) are fed with the same A/B-side power and UPS in a one-size-fits-all approach. There is an opportunity for reducing TCO by assigning less expensive standards to the infrastructure supplying capacity for less critical applications.

After
Electricity rate: $0.093 Electricity usage per hour: Bronze infrastructure: 16 kW Silver infrastructure: 11 kW Gold infrastructure: 51 kW Cost per year: $63,588.64 Managing capacity a way that matches criticality with business demand has resulted in service tiers that save the organization money.

Cost per square foot: $864

- 26%

Cost per square foot: $636

These savings consider facilities costs alone. Service tiering can achieve even more savings in areas such as server CAPEX, network costs, and reducing the time needed to manage physical infrastructure.
33

Analyze the impact of new or updated apps. Pursue a policy of virtualization first for agile provisioning

Virtualize unless otherwise. Virtualization is a tactic for enabling more efficient and agile provisioning. All new or updated workloads should be evaluated for virtual hosting.

A gold, silver, or bronze service tier represents a baseline - what is good enough to provision a given workload in line with its performance and criticality requirements. At the server level a service tiers can include both native (non-virtual) servers and clusters of servers that have been partitioned for virtualization. Taking a virtualize unless otherwise approach, new and updated apps should be assessed for hosting on the virtualized tier. Updates can include needs for new levels of performance and capacity. Legacy apps on end-of-life hardware should also be evaluated for migration to the virtual tier. In order to assess the impact of new workloads on capacity, careful assessment of requirements is needed. Use the Application Assessment Checklist (modified from Appleton Ideas) as a template for developing your own.

Non-Virtual Gold Servers P2V Migration Virtualized Gold Cluster

Visual example of gold, silver, bronze service tiers with both virtual and non-virtual servers

Non-Virtual Silver Servers

c
Consolidated network with variable service tiers

c
Consolidated storage with variable service tiers

P2V Migration
Virtualized Silver Cluster

Virtualized Bronze Cluster

Note: Some enterprises may find that virtual infrastructure is not ready for their gold tier. Non-virtual servers can include nonstandard (non-x86) servers. Some workloads may never be virtualized.

A trigger for virtualizing core production workloads in several companies has been the realization that performance and availability (service) for secondary workloads in their virtual server environment was better than what for primary workloads in a non-virtual environment.
34

Analyze demand to model service requirements; identify trends to forecast future business and new workloads

With current capacity under control, begin looking to the future of the business, and how growth will change the capacity needed to fuel the required workloads.

Forecast business activity growth in the business will mean more transactional processing. If growth translates into more staff, it may also translate into more users of applications. Include increased demand in the analysis of requirements for new and updated applications.

Monitor and analyze capacity requirements over time.

From a capacity standpoint, we hit a wall of CPU saturation before we realized where the practical limit was. We learned, with a bit of pain, to use software to model a trend line telling us youre going to hit a wall at this time next year unless you add capacity.

35

Develop a capacity plan: use a reserve capacity model for management & planning

The capacity reservation model tiers capacity according to agility, reliability, control, and cost. The idea of reservation reintroduces the importance of justification for capacity usage. Capacity is not open ended but reserved for certain kinds of workloads. Reserve capacity enables business units to order IT services as they would from a managed service provider (including an external infrastructure as a service cloud). But IT can also show the its entire capability in terms of units (server instances) that can be supported at each level (see the case study on slide 22 for an example). Adding a workload to a capacity tier counts against available capacity a limited resource. Accommodating the addition may require spending to increase capacity or removal/retirement of another workload to free-up capacity.

Each time a unit of capacity from one of the three tiers is provisioned out to the business it is removed from the pool of available capacity. The remaining capacity can be monitored as a gas gauge or planning point for bringing additional capacity online. The gas gauge approach avoids ad hoc hardware purchases and avoids over-provisioningand overspending as capacity is brought online at each level commensurate with projected need.

36

Capacity management is a process, not a tool or a product. Look to tools to help automate tasks but have a process in place first
For many (37%) the tools for managing a capacity planning process include a pad and pen, a white board, and a spreadsheet. These are a perfectly legitimate tool set for working with your systems team, recording usage information, and planning the future. A plurality of respondents (47%) use a combination of point tools such as, for example, vendor specific storage management tools combined with virtual infrastructure management such as VMware vCenter. These provide visibility into the system from application through virtual and physical infrastructure as well as dynamic provisioning capabilities. Another tool set helpful for modeling and monitoring the impact of new and updated workloads on the virtual environment (as outlined in planning step 3) include capacity planning and monitoring tools from CiRBA, VMware, Microsoft, PlateSpin, and vKernal. Only 5% use comprehensive software that automates management across the entire consolidated infrastructure. These tended to be organizations with the largest proportion of virtualized infrastructure.
Comp. Software 5%

However, organizations that did use comprehensive software were the most successful in their consolidation efforts.

+53%
78

65

60
51

None 10%

Comp.

Some

Manual

None

Manual 37%

Software 52%

Some Software 47%

Dependent variable is average success percentile. See slide 6 for explanation of success.

Ensure Service Delivery with Systems Management

37

Case study: This manufacturer has deployed tiered services & capacity monitoring as it closes on a goal of 99% virtualization
The client: Manufacturer of specialty paper products. They have been virtualizing infrastructure for over five years, and now over 96% of their servers are virtual. The plan is to have 99.5% of the infrastructure virtualized as soon as possible, only avoiding virtualization when hardware limitations absolutely prevent it.

Follow a virtualize unless otherwise physical-to-virtual policy


The organization identified a wide range of benefits from virtualizing nearly 100% of their infrastructure, such as: Lower costs. Simplified management of physical resources. Less disruption in service. Great agility. This was the number one reason to upgrade.

Tier services for cost savings


From storage on up, resources are divided according to criticality. The main resource pooling follows a three-tier system similar to the one recommended by Info-Tech:
High-priority production: workloads critical to the business. Production: minimum standard for inuse workloads. Test/dev: non-critical for in-progress workloads.

Monitor capacity to forecast expansion needs


Current infrastructure load is monitored in order to know when to add new hardware. Trend lines output by automated software predict when storage capacity will run out. Metrics such as CPU and memory load are pulled from monitoring tools, further informing capacity planning.

Key management tools include Compellent Storage Management for dynamic storage tiering and VMware vCenter.

Good news: the advice laid out in this Solution Set works as well in practice as it does in theory. With a solid capacity management plan in place, the organization reports success in realizing benefits and almost no pitfalls in their comprehensive consolidation efforts.
38

Prepare for a future of hybrid clouds & cloud bursting


The external cloud will continue to develop and mature as the enterprise focuses on internal cloud development. Look for future management solutions to span internal and external clouds
This Solution Set has focused on internal cloud capacity management, because for most (76%), internal cloud development comes first. However, opportunities in the external public cloud will continue to develop and mature over the next three to five years. Opportunities include:
Hybrid orchestration across internal and external cloud environments. Internal and external capacity will be connected and management will span both. External capacity may become a service tier. For example, an external infrastructure as a service cloud could become the bronze tier, so long as redundancy and performance meet internal requirements. External cloud capacity will be used on demand to meet spikes in capacity need. This cloud bursting (see right) will bring the open ended scalability of external cloud to internal requirements.

How Cloud Bursting Works

Capacity in External Cloud Capacity in Use

Good enough performance

Standby Capacity Redundant Capacity

Total Capacity Current Need


Future Need

In a cloud bursting scenario, available and appropriately redundant capacity is maintained in a public cloud for spikes in need for capacity from internally hosted applications.
39

Conclusions
Capacity management is a critical process

Start with business needs Turn data center management outside-in; think like a service provider delivering IaaS to the business. Determine total service requirements as the total of performance, criticality, and growth. Gather a team and document apps and infrastructure to prepare for advanced capacity management.

Follow the five steps to developing a capacity plan Analyze current capacity, optimize the infrastructure, analyze impact, determine demand, then develop a plan that takes future growth into account. Think of capacity as a gas gauge, and divide it into tiers for optimum success. Consider automation tools, but make sure there is a process in place for automation to have benefit.

In an internal cloud, the organization bears the full burden of all capacity used or unused. Virtualization does not create capacity. Its benefits can only be fully realized with careful capacity management. Begin capacity management now to prepare for an increasingly cloudy future.
.36 * .21 * .09 .15 Service Tiers Capacity Management Cost Accounting Capacity Planning

40

You might also like