You are on page 1of 40

Cloud Infrastructure Management

Performance and Availability

Data Center
Capital intensive Infrastructure: CAPEX Dedicated Ops Team Custom Software Stack

Cloud
Lean Infrastructure: OPEX Dedicated Ops Guy OSS and SaaS

What does it mean?

Cloud Infrastructure is agile New servers take minutes not months Provision and deploy at-will Operations must be agile

Agile Operations
In the face of constant change, infrastructure must sustain business goals

OODA [John Boyd] Observe, Orient, Decide, Act

Observe
Business Metrics Application Performance Cloud Resources

Business Metrics
Specific to your business May change over time Should influence all decisions Should be correlated to OPEX

App Performance
Uptime / Availability - Pingdom Resp. Time - New Relic, AppDynamics, JMX Logging - Loggly, Papertrail, Splunk

Cloud Resources

OSS - Ganglia, Nagios, collectd SaaS - Cloudkick, Server Density, Circonus

Observe
Business Metrics

...but..

Application Performance

Cloud Resources

Observe
Business Metrics

...but..

Application Performance

Cloud Resources

Cloud Monitoring and Management as a Service

Silverline
Application Resource Management

Silverline
Application Resource Management - Delivered as a Service

Application Monitoring Tier Monitoring Application Management

Transparent Instrumentation
Application

SL Library C Standard Library (libc) OS

Delivered as-a-Service

Application Monitoring
Metrics for Unprecedented Insight

Server Monitoring

Application Monitoring

Resources Monitored
CPU Memory Disk I/O Network I/O

Multi-Resource View

Continuous Deployment

Capacity Planning

Monitoring Abstractions
Tags Multiple Tags per application Aggregation/Breakout across scale-out tier

Tier Monitoring

Application Management
Providing the Ability to Act

Hardware Fails Services have downtime Traffic spikes suddenly

Act
Scale-out is only part of the picture Most scalable components have multiple moving parts Affected services should degrade predictably Polices should be in place to divide system resources under contention

Application Management
Application-level resource policies Weighted fair-share scheduler Predictable service degradation

Policy Configuration

Use Case: Harvesting


Explicit allocation of 0% CPU High Frequency Trading and Risk Analysis Unicorn/Nginx and Resque

Use Case: Harvesting


Primary App

Primary App Background App

Primary App Background App with Harvesting

Silverline
Supported Platforms

Ubuntu LTS 10.04/10 (lucid/maverick) Debian Stable (lenny) RHEL/CentOS 5.x, Fedora 12/13 Amazon Linux AMI

Silverline
Use Cases

Continuous deployment feedback loop Predictable Service Degradation Capacity Planning Root cause analysis

Conclusion

The DevOps Tool-kit


Continuous deployment feedback loop Predictable Service Degradation Capacity Planning Root cause analysis Automated alerting and resolution

Company Overview
Venture backed: Acartha, Menlo Ventures 10 employees Headquartered in San Francisco SOMA

Brief History
Sliverline development started in 2007 Launched Q4 08 as "Load Manager" Enterprise dev & sales model Switched to SaaS-based model Q310

Demo

You might also like