You are on page 1of 54

capacity planning for

LAMP
what happens after you’re scalable

MySQL Conf and Expo


April 2007
John Allspaw

• Engineering Manager (Operations) at


flickr (Yahoo!)


Yay!

• You’re scalable! (or not)


• Now you can simply add hardware as
you need capacity.

• (right ?)
• But:
• How many servers ?
BUT, um, wait....
• How many databases ?
• How many webservers ?
• How much shared storage ?
• How many network switches ?
• What about caching ?
• How many CPUs in all of these ?
• How much RAM ?
• How many drives in each ?
• WHEN should we order all of these ?
some stats
• - ~35M photos in squid cache (total)
• - ~2M photos in squid’s RAM
• - ~470M photos, 4 or 5 sizes of each
• - 38k req/sec to memcached (12M
objects)
• - 2 PB raw storage (consumed about
~1.5TB on Sunday)

capacity
capacity
doesn’t
mean
speed
capacity is for business
too much

Buying enough
for now
enough
not

too soon too late


3 main parts

• - Planning (what ?/why ?/when ?)


• - Deployment (install/config/manage)
• - Measurement (graph the world)
boring queueing theory
• Forced Flow Law:
• X =Vi i x X0
Little’s Law:
N=XxR
Service Demand Law:
Di = Vi x Si = Ui / X0

my theory

• capacity planning math is based on


real things, not abstract ones.
predicting the future
consumable
concurrent usage
considerations:
social applications

• - Have the ‘network effect’


• - Exponential growth


considerations:
social applications
• Event-related growth
• (press, news event, social trends, etc.)

• Examples:

• London bombing, holidays, tsunamis, etc.



What do you have
NOW ?

• When will your current capacity be


depleted or outgrown ?
finding ceilings

• MySQL (disk IO ?)
• SQUID (disk IO ? or CPU ?)
• memcached (CPU ? or network ?)
forget benchmarks

• boring
• to use in capacity planning...not usually
worth the time
• not representative of real load
• test in production
what do you expect ?
• define what is acceptable
• examples:
• squid hits should take less than X
milliseconds
• SQL queries less than Y
milliseconds, and also keep up with
replication
measurement
accept the
observer effect

• measurement is a necessity.
• it’s not optional.
http://ganglia.sf.net
gmetad

db1 db2 db3


XML over TCP
xml over UDP on 239.2.11.84
(multicast)

www www www


1 2 3

xml over UDP on 239.2.11.83


(multicast)
gmetad

db1 db2 db3


XML over TCP
xml over UDP on 239.2.11.84
(multicast)

www www www


boom!
1 2 3

xml over UDP on 239.2.11.83


(multicast)
super simple graphing

• #!/bin/sh
• /usr/bin/iostat -x 4 2 sda | grep -v ^$ | tail -4 > /tmp/
disk-io.tmp
• UTIL=`grep sda /tmp/disk-io.tmp | awk '{print $14}'`
• /usr/bin/gmetric -t uint16 -n disk-util -v$UTIL -u '%'
memcached
what if you have graphs
but no raw data ?

• GraphClick
• http://www.arizona-software.ch/
applications/graphclick/en/

application usage
• Usage stats are just as important
• as server stats!
• Examples:
• # of user registrations
• # of photos uploaded every hour
not a straight line
another not straight line
but straight relationships!
measurement examples
queries
disk I/O
What we know now

• we can do at least 1500 qps (peak)


without:
- slave lag
- unacceptable avg response time
- waiting on disk IO
MySQL capacity
1. find ceilings of existing h/w
2. tie app usage to server stats
3. find ceiling:usage ratio
4. do this again:
- regularly (monthly)
- when new features are released
- when new h/w is deployed
caching maximums
caching ceilings
squid, memcache
• working-set specific:
• - tiny enough to all fit in memory ?
• - some/more/all on disk ?
• - watch LRU churn
churning full caches

• Ceilings at:
• - LRU ref age small enough to affect
hit ratio too much
• - Request rate large enough to affect
disk IO (to 100%)
squid requests and hits
squid hit ratio
LRU reference age
hit response times
What we know now

• we can do at least 620 req/sec (peak)


without:
- LRU affecting hit ratio
- unacceptable avg response time
- waiting too much on diskIO
not full caches

• (working set smaller than max size)


• - request rate large enough to bring
network or CPU to 100%
deployment
Automated Deploy
Tools
•SystemImager/SystemConfigurator
•- http://wiki.systemimager.org
• CVSup:
• - http://www.cvsup.org
• Subcon:
• - http://code.google.com/p/subcon/

questions ?

•http://flickr.com/photos/gaspi/62165296/
•http://flickr.com/photos/marksetchell/27964330/
•http://flickr.com/photos/sheeshoo/72709413/
•http://flickr.com/photos/jaxxon/165559708/
•http://flickr.com/photos/bambooly/298632541/
•http://flickr.com/photos/colloidfarl/81564759/
•http://flickr.com/photos/sparktography/75499095/

You might also like