MySQLConf2007 Capacity

capacity planning for
LAMP
what happens after you’re scalable
MySQL Conf and Expo

April 2007
John Allspaw
• Engineering Manager (Operations) at

flickr (Yahoo!)
•
•
Yay!
• You’re scalable! (or not)

• Now you can simply add hardware as
you need capacity.
• (right ?)
• But:
• How many servers ?
BUT, um, wait....
• How many databases ?
• How many webservers ?
• How much shared storage ?
• How many network switches ?
• What about caching ?
• How many CPUs in all of these ?
• How much RAM ?
• How many drives in each ?
• WHEN should we order all of these ?
some stats
• - ~35M photos in squid cache (total)
• - ~2M photos in squid’s RAM
• - ~470M photos, 4 or 5 sizes of each
• - 38k req/sec to memcached (12M
objects)
• - 2 PB raw storage (consumed about
~1.5TB on Sunday)
•
capacity
capacity
doesn’t
mean
speed
capacity is for business
too much
Buying enough
for now
enough
not
too soon too late

3 main parts
• - Planning (what ?/why ?/when ?)

• - Deployment (install/config/manage)
• - Measurement (graph the world)
boring queueing theory
• Forced Flow Law:
• X =Vi i x X0
Little’s Law:
N=XxR
Service Demand Law:
Di = Vi x Si = Ui / X0
•
my theory
• capacity planning math is based on

real things, not abstract ones.
predicting the future
consumable
concurrent usage
considerations:
social applications
• - Have the ‘network effect’

• - Exponential growth
•
•
considerations:
social applications
• Event-related growth
• (press, news event, social trends, etc.)
• Examples:
• London bombing, holidays, tsunamis, etc.

•
•
What do you have
NOW ?
• When will your current capacity be

depleted or outgrown ?
finding ceilings
• MySQL (disk IO ?)
• SQUID (disk IO ? or CPU ?)
• memcached (CPU ? or network ?)
forget benchmarks
• boring
• to use in capacity planning...not usually
worth the time
• not representative of real load
• test in production
what do you expect ?
• define what is acceptable
• examples:
• squid hits should take less than X
milliseconds
• SQL queries less than Y
milliseconds, and also keep up with
replication
measurement
accept the
observer effect
• measurement is a necessity.
• it’s not optional.
http://ganglia.sf.net
gmetad
db1 db2 db3

XML over TCP
xml over UDP on 239.2.11.84
(multicast)
www www www

1 2 3

(multicast)
gmetad
db1 db2 db3

XML over TCP
(multicast)
www www www

boom!
1 2 3

(multicast)
super simple graphing
• #!/bin/sh
• /usr/bin/iostat -x 4 2 sda | grep -v ^$ | tail -4 > /tmp/
disk-io.tmp
• UTIL=`grep sda /tmp/disk-io.tmp | awk '{print $14}'`
• /usr/bin/gmetric -t uint16 -n disk-util -v$UTIL -u '%'
memcached
what if you have graphs
but no raw data ?
• GraphClick
• http://www.arizona-software.ch/
applications/graphclick/en/
•
application usage
• Usage stats are just as important
• as server stats!
• Examples:
• # of user registrations
• # of photos uploaded every hour
not a straight line
another not straight line
but straight relationships!
measurement examples
queries
disk I/O
What we know now
• we can do at least 1500 qps (peak)

without:
- slave lag
- unacceptable avg response time
- waiting on disk IO
MySQL capacity
1. find ceilings of existing h/w
2. tie app usage to server stats
3. find ceiling:usage ratio
4. do this again:
- regularly (monthly)
- when new features are released
- when new h/w is deployed
caching maximums
caching ceilings
squid, memcache
• working-set specific:
• - tiny enough to all fit in memory ?
• - some/more/all on disk ?
• - watch LRU churn
churning full caches
• Ceilings at:
• - LRU ref age small enough to affect
hit ratio too much
• - Request rate large enough to affect
disk IO (to 100%)
squid requests and hits
squid hit ratio
LRU reference age
hit response times
What we know now
• we can do at least 620 req/sec (peak)

without:
- LRU affecting hit ratio
- unacceptable avg response time
- waiting too much on diskIO
not full caches
• (working set smaller than max size)

• - request rate large enough to bring
network or CPU to 100%
deployment
Automated Deploy
Tools
•SystemImager/SystemConfigurator
•- http://wiki.systemimager.org
• CVSup:
• - http://www.cvsup.org
• Subcon:
• - http://code.google.com/p/subcon/
•
questions ?
•http://flickr.com/photos/gaspi/62165296/
•http://flickr.com/photos/marksetchell/27964330/
•http://flickr.com/photos/sheeshoo/72709413/
•http://flickr.com/photos/jaxxon/165559708/
•http://flickr.com/photos/bambooly/298632541/
•http://flickr.com/photos/colloidfarl/81564759/
•http://flickr.com/photos/sparktography/75499095/

MySQLConf2007 Capacity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MySQLConf2007 Capacity

Uploaded by

Copyright:

Available Formats

capacity planning for

MySQL Conf and Expo

• Engineering Manager (Operations) at

• You’re scalable! (or not)

too soon too late

• - Planning (what ?/why ?/when ?)

• capacity planning math is based on

• - Have the ‘network effect’

• London bombing, holidays, tsunamis, etc.

• When will your current capacity be

db1 db2 db3

www www www

xml over UDP on 239.2.11.83

db1 db2 db3

www www www

xml over UDP on 239.2.11.83

• we can do at least 1500 qps (peak)

• we can do at least 620 req/sec (peak)

• (working set smaller than max size)

You might also like