You are on page 1of 67

Riak at Formspring

Why and How

Wednesday, July 27, 2011

Hi, Im Tim.
@pims

Wednesday, July 27, 2011

Formspring helps people nd out more about each other through sharing interesting & personal responses.

Wednesday, July 27, 2011

Wednesday, July 27, 2011

25M users

Wednesday, July 27, 2011

3.4B responses

Wednesday, July 27, 2011

Two upcoming features

Wednesday, July 27, 2011

1. Notication Feed

Wednesday, July 27, 2011

2. (Magic) Inbox

Wednesday, July 27, 2011

Notication Feed
Very similar to what @coda and @rckenne
built at Yammer.

Wednesday, July 27, 2011

Check out their slides & video


http://blog.basho.com/2011/03/28/Riak-and-Scala-atYammer/

Wednesday, July 27, 2011

tl;dr
Notication feed can be bounded in size Monotonic

Wednesday, July 27, 2011

Perfect t for

Wednesday, July 27, 2011

A few differences

Wednesday, July 27, 2011

Scala Python

Wednesday, July 27, 2011

Why python?
already part of our codebase lots of experience batteries included

Wednesday, July 27, 2011

Riak python client

Wednesday, July 27, 2011

Great, but no HTTP connection pool

Wednesday, July 27, 2011

So we added it

Wednesday, July 27, 2011

You can thank


@gillesdevaux

Wednesday, July 27, 2011

We switched to Protobuff
( and @sku_ added Protobuff connection caching to the riak python client)

Wednesday, July 27, 2011

2. (Magic) Inbox

Wednesday, July 27, 2011

First lets talk about our classic inbox

Wednesday, July 27, 2011

Wednesday, July 27, 2011

items sorted by timestamp *

Wednesday, July 27, 2011

8 MySQL shards
A E B F C G D H

Wednesday, July 27, 2011

Basic Inbox ow
A C
Frontend QUEUE

B D F H

E G

Wednesday, July 27, 2011

shard D fails

Wednesday, July 27, 2011

Wednesday, July 27, 2011

Wednesday, July 27, 2011

Inbox becomes th of inaccessible for 1/8


(thats 3.1M users)

Wednesday, July 27, 2011

We need higher availability


no SPOF

Wednesday, July 27, 2011

We need better* fault-tolerance

Wednesday, July 27, 2011

Clustering with Riak

Wednesday, July 27, 2011

Replication for free*

Wednesday, July 27, 2011

storing indexes only


( data was already stored in Cassandra)

Wednesday, July 27, 2011

3 Buckets (classic, magic, agged )


keyed by account ID

Wednesday, July 27, 2011

(score, deleted, origin_id )


easy

Wednesday, July 27, 2011

in classic inbox, score = timestamp

Wednesday, July 27, 2011

in agged inbox, score = timestamp

Wednesday, July 27, 2011

in magic inbox, score = timestamp +/- points

Wednesday, July 27, 2011

def inbox_lter(seq): """Representation that we pass to the client, i.e. lter out tombstones""" if seq is None: return [] return [(score, origin_id, askers) for score, deleted, origin_id, askers in seq if not deleted]

Wednesday, July 27, 2011

Things weve learned


( the hard way )

Wednesday, July 27, 2011

Benchmark on real data


( eventually all indexes will reach the limit)

Wednesday, July 27, 2011

understanding trafc patterns is crucial


activity varies greatly between 7am & 7pm

Wednesday, July 27, 2011

Can it go higher?
max

min

avg

Wednesday, July 27, 2011

Start at 1%

Roll out = 1%

Wednesday, July 27, 2011

Still 1%

Roll out = 1%

Wednesday, July 27, 2011

Still 1%
Roll out = 1%

Wednesday, July 27, 2011

a 7000x variation for just 1% of our users


Roll out = 1%

Wednesday, July 27, 2011

Wednesday, July 27, 2011

Fail fast

Wednesday, July 27, 2011

no, seriously, fail fast


( too many retries will kill you when requests timeout )

Wednesday, July 27, 2011

Idempotence is key to dealing with failure


( retry as many time as you want )

Wednesday, July 27, 2011

deduplication for free!


( if a question get reasked a dozen times, youll only receive it once )

Wednesday, July 27, 2011

How?
log and process later queue and process later

Wednesday, July 27, 2011

Setup (per feature)


2 app servers (c1.xlarge) 4 riak nodes (m1.xlarge) haproxy

Wednesday, July 27, 2011

Things youd never expect


( true and sad story)

Wednesday, July 27, 2011

Our default AMI had the open les limit set to

Wednesday, July 27, 2011

1024

Wednesday, July 27, 2011

Wednesday, July 27, 2011

your service will die in no time


( and in the middle of the night, obviously )

Wednesday, July 27, 2011

OPS RIAK
Wednesday, July 27, 2011

If we had one request for Riak


(and Basho)

Wednesday, July 27, 2011

Opscenter for Riak?

Wednesday, July 27, 2011

Opscenter for Riak?

( from one of our Cassandra clusters )

Wednesday, July 27, 2011

Opscenter for Riak?

Wednesday, July 27, 2011

You might also like