You are on page 1of 101

PLAY/TYPE

Starling + Workling: simple


distributed background jobs
with Twitter’s queuing system

morning, hi, my name is rany keddo, i run a little startup in frankfurt called play/type.
PLAY/TYPE

git clone \
git://github.com/purzelrakete/cows-not-kittens.git

OR...

• sudo gem install gitjour


• gitjour list
• gitjour clone cows_not_kittens

you might want to start by grabbing the demo project. i’m serving it.
PLAY/TYPE

WTF?

this talk is about running code asynchronously in your rails application. this means removing
long running or side effect code from your request cycle.
for some reason cows started to creep into the slides while i was working on them. hoping to
start a trend *away* from cats in tech presentations... you’ll see this reflected in the example
project. make sure it’s working by rake db:migrate && starting irb in the project, then you
type
PLAY/TYPE

>> CowSubsystem.moo
PLAY/TYPE

Cows not Kittens

this is an example app to demonstrate why you need background work, and how you can do
this. also, you can milk cows with this application.
PLAY/TYPE

1 class CowsController < ApplicationController


2 resource_this
3
4 # milking has the side effect of causing
5 # the cow to moo. we don't want to
6 # wait for this while milking, though,
7 # it would be a terrible waste ouf our time.
8 def milk
9 @cow = Cow.find(params[:id])
10 @cow.milk
11 end
12 end
PLAY/TYPE

1 class Cow < ActiveRecord::Base


2
3 # TODO: SAP integration
4 def milk
5 moo
6 end
7
8 # Bothersome side-effect
9 def moo
10 CowSubsystem.moo
11 end
12 end
PLAY/TYPE

Milk it

show the application


PLAY/TYPE

Real Examples

lets look at some real examples!


PLAY/TYPE

1 AnalyticsHit.create \
2 :potential_user_id => potential_user_id,
3 :event => "converted"
4

thinking: doesnt really belong in the request cycle: statistics.


PLAY/TYPE

1 class PageView < ActiveRecord::Base


2 belongs_to :viewer
3 belongs_to :viewable, :polymorphic => true
4 end

nor does this: stuff that does not have to have an immediate effect on the page you’re
rendering.
PLAY/TYPE

1 CommentMailer.deliver_created(comment)
2

or this, thinking: this should be put in the background, really.


PLAY/TYPE

1 Blackbook.get \
2 :username => "moo@cheapmail.com",
3 :password => "milky"

and especially this sort of long running process - scraping contacts from webmailer.
PLAY/TYPE

Wherefore art thou,


Rails?

no active* way of handling this... something consistent that works for almost everybody.
instead: too many options. that’s why people come to this sort of talk.
You are a snowflake again.

which solution will you tie yourself to?


decide now, because doing background stuff last is like deciding to write your tests *after*
the code is done.
PLAY/TYPE

Trust nobody!

my solution to this: remain independent of all these background technologies, by building a


little worker framework with providers, like active record.
PLAY/TYPE

Workling

wrote workling. aims of workling are...


PLAY/TYPE

Workling
• Easy plugging of new Job Runners

wrote workling. aims of workling are...


PLAY/TYPE

Workling
• Easy plugging of new Job Runners
• Nice Rails integration

wrote workling. aims of workling are...


PLAY/TYPE

Workling
• Easy plugging of new Job Runners
• Nice Rails integration
• Plays nicely with tests

wrote workling. aims of workling are...


PLAY/TYPE

Workling
• Easy plugging of new Job Runners
• Nice Rails integration
• Plays nicely with tests
• Lightweight and hackable

wrote workling. aims of workling are...


PLAY/TYPE

1 script/plugin install \
2 git://github.com/purzelrakete/workling.git
3
4 script/plugin install \
5 git://github.com/tra/spawn.git

workling will automatically use spawn if it is installed.


PLAY/TYPE

create a worker class in app/workers


PLAY/TYPE

1 #
2 # handle asynchronous mooing.
3 #
4 class CowWorker < Workling::Base
5
6 # let the moo-ings begin!
7 def moo(options = {})
8 cow = Cow.find(options[:id])
9 cow.moo
10 end
11 end

subclass workling:base, add a method. you need to have an options argument.


PLAY/TYPE

1 class Cow < ActiveRecord::Base


2
3 # TODO: SAP integration
4 def milk
5 CowWorker.async_moo(:id => id)
6 end
7
8 # bothersome side-effect
9 def moo
10 CowSubsystem.moo
11 end
12 end

now make the asynch call in your milk method.


PLAY/TYPE

Milk it!
PLAY/TYPE

What’s Spawn?
1 script/plugin install \
2 git://github.com/tra/spawn.git

explain what’s going on here... we’ve used spawn as a runner for workling. what’s spawn?
PLAY/TYPE

1 spawn do
2 logger.info("I feel sleepy...")
3 sleep 11
4 logger.info("Time to wake up!")
5 end

by itself you can run it like this. it will fork the process....
PLAY/TYPE

1 >> fork { sleep 100 }


2 => 1060

like this, basically, but with all rails fixes and tweaks in place. above: drops to unix, the OS
copies the process & creates a child process. try this in your console and use top to look at
the processes.
PLAY/TYPE

workling + spawn inherits these traits.


PLAY/TYPE

• Fast. Happens at OS level

workling + spawn inherits these traits.


PLAY/TYPE

• Fast. Happens at OS level


• Rails copy can be big. Irb says ~35MB

workling + spawn inherits these traits.


PLAY/TYPE

• Fast. Happens at OS level


• Rails copy can be big. Irb says ~35MB
• Local. Happening on same Machine

workling + spawn inherits these traits.


PLAY/TYPE

• Fast. Happens at OS level


• Rails copy can be big. Irb says ~35MB
• Local. Happening on same Machine
• Kill scenario - no persistence, job lost

workling + spawn inherits these traits.


PLAY/TYPE


Twitter’s Evan Weaver and nesting friend.

...If you just want to fire and forget a local process as


you say, I think Spawn is pretty good.

before i started on workling, i asked evan weaver of chow fame (twitter now) what he
thought. this is his what he said about spawn.
PLAY/TYPE

BackgroundJob

new kid on the block. very nice take on things.


PLAY/TYPE

1 script/plugin install \
2 git://github.com/purzelrakete/workling.git
3
4 ./script/plugin install \
5 http://codeforpeople.rubyforge.org/svn/rails/plugins/bj
6
7 ./script/bj setup

lets start over, workling + bj. don’t need to do anything else, since bj is automatically
detected.
PLAY/TYPE

1 Workling::Remote.dispatcher =
2 Workling::Remote::Runners::BackgroundjobRunner.new
3

however, the workling runner can also be set manually like this, inside of environment.rb or
under config/initializers. this is being done automatically for you.
PLAY/TYPE

Milk it!
PLAY/TYPE

Why the lag?

i will explain... first of all, what is backgroundjob.


PLAY/TYPE

next slide: installing. already did this.


PLAY/TYPE

• Written by Ara T. Howard (codeforpeople)

next slide: installing. already did this.


PLAY/TYPE

• Written by Ara T. Howard (codeforpeople)


• Sponsored by Engineyard

next slide: installing. already did this.


PLAY/TYPE

• Written by Ara T. Howard (codeforpeople)


• Sponsored by Engineyard
• Lightweight, persistent.

next slide: installing. already did this.


PLAY/TYPE

1 ./script/plugin install \
2 http://codeforpeople.rubyforge.org/svn/rails/plugins/bj
3
4 ./script/bj setup
1 create_table :bj_config do |t|
2 t.column "command" , :text
3 t.column "state" , :text
4 t.column "priority" , :integer
5 t.column "tag" , :text
6 t.column "is_restartable" , :integer
7 t.column "submitter" , :text
8 t.column "runner" , :text
9 t.column "pid" , :integer
10 t.column "submitted_at" , :datetime
11 t.column "started_at" , :datetime
12 t.column "finished_at" , :datetime
13 t.column "env" , :text
14 t.column "stdin" , :text
15 t.column "stdout" , :text
16 t.column "stderr" , :text
17 t.column "exit_status" , :integer
18 end
setup is running this migration.
PLAY/TYPE

1 job = Bj.submit 'cat /etc/password'


2 Bj.table.job.find(:all) # jobs table
PLAY/TYPE

1 if(job.finished) ...

t.column "pid" , :integer


t.column "finished_at" , :datetime
t.column "stdin" , :text
t.column "stdout" , :text
t.column "stderr" , :text
t.column "exit_status" , :integer

If you want something back... these are some useful columns in the db. they are available on
the job object, too.
PLAY/TYPE

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: load Rails 1x / Request

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: load Rails 1x / Request


• Memory: copy of Rails / Request. No leaks.

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: load Rails 1x / Request


• Memory: copy of Rails / Request. No leaks.
• Kill scenario - Persistent over DB

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: load Rails 1x / Request


• Memory: copy of Rails / Request. No leaks.
• Kill scenario - Persistent over DB
• Jobs runner process manages itself

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: load Rails 1x / Request


• Memory: copy of Rails / Request. No leaks.
• Kill scenario - Persistent over DB
• Jobs runner process manages itself
• Runner can be on another machine

workling + bj inherits these traits.


PLAY/TYPE

howz it work? this is why the moo came later than with spawn.
PLAY/TYPE

• Starts a thread for each job

howz it work? this is why the moo came later than with spawn.
PLAY/TYPE

• Starts a thread for each job


• The thread invokes a new OS process

howz it work? this is why the moo came later than with spawn.
PLAY/TYPE

• Starts a thread for each job


• The thread invokes a new OS process
• ./script/runner loads rails

howz it work? this is why the moo came later than with spawn.
PLAY/TYPE

• Starts a thread for each job


• The thread invokes a new OS process
• ./script/runner loads rails
• Results written to DB

howz it work? this is why the moo came later than with spawn.
PLAY/TYPE

• Starts a thread for each job


• The thread invokes a new OS process
• ./script/runner loads rails
• Results written to DB
• Client side gets results from DB

howz it work? this is why the moo came later than with spawn.
PLAY/TYPE

Added Bj Runner to
Workling like this...

Added the BJ runner yesterday. here’s how it was done...


3 module Workling
4 module Remote
5 module Runners
6 class BackgroundjobRunner < Workling::Remote::Runners::Base
7 cattr_accessor :routing
8
9 def initialize
10 BackgroundjobRunner.routing =
11 Workling::Starling::Routing::ClassAndMethodRouting.new
12 end
13
14 def run(clazz, method, options = {})
15 stdin = @@routing.queue_for(clazz, method) +
16 " " +
17 options.to_xml(:indent => 0, :skip_instruct => true)
18
19 Bj.submit "./script/runner ./script/bj_invoker.rb",
20 :stdin => stdin
21
22 return nil # that means nothing!
23 end
24 end
25 end
26 end
27 end
explain what’s going on.
1 @routing = Workling::Starling::Routing::ClassAndMethodRouting.new
2 unnormalized = REXML::Text::unnormalize(STDIN.read)
3 message, command, args = *unnormalized.match(/(^[^ ]*) (.*)/)
4 options = Hash.from_xml(args)["hash"]
5
6 if workling = @routing[command]
7 workling.send @routing.method_name(command), options.symbolize_keys
8 end
PLAY/TYPE

Starling
PLAY/TYPE

1 gem sources -a http://gems.github.com/


2 sudo gem install starling-starling
3 sudo gem install fiveruns-memcache-client
4
5 script/plugin install \
6 git://github.com/purzelrakete/workling.git

add github to your sources if you havent already done so. explain fiveruns client.
PLAY/TYPE

1 mkdir /var/spool/starling
2 sudo starling -d
3 script/workling_starling_client start

need 2 processes running. 1: starling. 2: workling starling client.


PLAY/TYPE

1 Workling::Remote.dispatcher =
2 Workling::Remote::Runners::StarlingRunner.new
PLAY/TYPE

Milk it already...
PLAY/TYPE

Starling

lightweight queue that speaks memcached. developed at twitter by blaine cook 2 make
twitter arch more msg-oriented.
PLAY/TYPE

4 # Put messages onto a queue:


5 require 'memcache'
6 starling = MemCache.new('localhost:22122')
7 starling.set('my_queue', 1)
8
9 # Get messages from the queue:
10 require 'memcache'
11 starling = MemCache.new('localhost:22122')
12 loop { puts starling.get('my_queue') }
13
PLAY/TYPE

Memcache Client
PLAY/TYPE

Memcache Client

• Errors in Memcache Client (Robot Co-Op


1.5.0)
PLAY/TYPE

Memcache Client

• Errors in Memcache Client (Robot Co-Op


1.5.0)
• Solution: http://github.com/fiveruns/
memcache-client/tree/mastermemcache-
client/tree/master
PLAY/TYPE

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: very fast.

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: very fast.


• Memory low, unless you’re leaking. Use
God to monitor / restart your workers.

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: very fast.


• Memory low, unless you’re leaking. Use
God to monitor / restart your workers.
• Kill scenario - Persistent over Starling

workling + bj inherits these traits.


PLAY/TYPE

• Warmup speed: very fast.


• Memory low, unless you’re leaking. Use
God to monitor / restart your workers.
• Kill scenario - Persistent over Starling
• Need to manage processes

workling + bj inherits these traits.


PLAY/TYPE


... The main things lacking in
Starling are non-destructive
reads (transactions), and
speed.

twitter moving away from starling. putting msgs back onto queue not possible after kill/
crash.
PLAY/TYPE


... The main things lacking in
Starling are non-destructive
reads (transactions), and
speed.

• Transactions. Imagine Starling is killed just


after reading a msg off a queue... not
reliable. Doesnt map nicely onto memcache

twitter moving away from starling. putting msgs back onto queue not possible after kill/
crash.
PLAY/TYPE


... The main things lacking in
Starling are non-destructive
reads (transactions), and
speed.

• Transactions. Imagine Starling is killed just


after reading a msg off a queue... not
reliable. Doesnt map nicely onto memcache
• It can take 20 minutes to play back a Starling
journal after a crash on a very powerful
machine. In production, this is about 19.5
minutes too many.

twitter moving away from starling. putting msgs back onto queue not possible after kill/
crash.
apparently stable, millions of messages / day with workling + starling.
we are using starling at play/type and for us, it’s fine. but if replay for huge traffic /
destructive reads are an issue, starling isn’t for you.
PLAY/TYPE

TODOs

workling is up on github. fork it! here’s what needs to be done, come join the project.
PLAY/TYPE

MemcachelikeRunner
PLAY/TYPE

MemcachelikeRunner

take the StarlingRunner and refactor it to be generic for all Queue Systems that imitate the
memcache api. once this is done, we’ll be able to plug in the following...
sparrow + workling running out there, no code unfortunately.
PLAY/TYPE

MemcachelikeRunner

• Sparrow (“a really fast lightweight queue


written in Ruby that speaks memcache. “)

take the StarlingRunner and refactor it to be generic for all Queue Systems that imitate the
memcache api. once this is done, we’ll be able to plug in the following...
sparrow + workling running out there, no code unfortunately.
PLAY/TYPE

MemcachelikeRunner

• Sparrow (“a really fast lightweight queue


written in Ruby that speaks memcache. “)
• RudeQ (DB based, no process for queue)

take the StarlingRunner and refactor it to be generic for all Queue Systems that imitate the
memcache api. once this is done, we’ll be able to plug in the following...
sparrow + workling running out there, no code unfortunately.
PLAY/TYPE

BeanstalkdRunner

might be possible to run this with a MemcachelikeRunner.


PLAY/TYPE

BeanstalkdRunner

• Fast non persistent Queue written in C.

might be possible to run this with a MemcachelikeRunner.


PLAY/TYPE

BeanstalkdRunner

• Fast non persistent Queue written in C.


• Written for “Causes” on Facebook

might be possible to run this with a MemcachelikeRunner.


PLAY/TYPE

AMPQRunner
PLAY/TYPE

BackgroudndRB

heavyweight of backgrounding, oldest solution. lots of people using this.


PLAY/TYPE

FUD?


I wish, people will check their facts before
making any claims, I am kinda getting tired of
fighting this FUD within community. There are
few outstanding issues, but BackgrounDRb
supports many features that other similar
alternatives doesn’t offer. And I am working on
it.

- Hemant

backgroundrb comes with emotional baggage, for me.


who’s running backgroundrb in the room, hands up? who has problems with it? who has NO
problems?
PLAY/TYPE

BackgroundRB
PLAY/TYPE

BackgroundRB
• As of version1.0.3 - complete rewrite with
Packet, no DRB code in there anymore.
PLAY/TYPE

“ Packet is a network programming library in the


spirit of EventMachine and yet it has nice
functionality of letting you attach callbacks to
workers running in separate process. It can
even let you invoke callbacks running on
worker in different machine and stuff like that.
When I took over project it was based on DRb,
but since then I have removed DRb and
BackgrounDRb is 100% based on evented
model of network programming.

- Hemant

my personal impression: still heavy. waiting for somebody to try integrating it into workling,
no personal need.
PLAY/TYPE

Okay, but what about


Workling Status and
Return?

have a real world examle. old school, circa Feb. 2008 social network imports over gmail
scraping... need this out of the request, but the response has to be shown, too.
PLAY/TYPE

1 class NetworkWorker < Workling::Base


2 def search(options)
3 accounts = options[:accounts]
4 uid = options[:uid]
5
6 accounts.map do |network|
7 Blackbook.get \
8 :username => network[:username],
9 :password => network[:password])
10 end
11
12 Workling::Return::Store.set(uid, accounts)
13 end
14 end

explain how this works - scraping gmail. return store: again, using memcache api.
PLAY/TYPE

1 def poll
2 @results = Workling::Return::Store.get \
3 params[:workling_uid]
4
5 # TODO: handle no results, results
6 # and results with errors
7 end
PLAY/TYPE

Rany Keddo
rany@playtype.net

Questions?
Lunch!

You might also like