How Linkedin Scaled

SCALING LINKEDIN
A BRIEF HISTORY
Josh Clemm
www.linkedin.com/in/joshclemm
Scaling = replacing all the components

of a car while driving it at 100mph
Via Mike Krieger, Scaling Instagram
LinkedIn started back in 2003 to

connect to your network for better job
opportunities.
It had 2700 members in first week.
First week growth guesses from founding team
400M
400M
350M
300M
250M
200M
Fast forward to today...
150M
100M
50M
32M
0M
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
LINKEDIN SCALE TODAY

LinkedIn is a global site with over 400 million
members
Web pages and mobile traffic are served at
tens of thousands of queries per second
Backend systems serve millions of queries per
second
How did we get there?
Lets start from
the beginning
LEO
DB
LINKEDINS ORIGINAL ARCHITECTURE
LEO
LEO
Huge monolithic app

called Leo
Java, JSP, Servlets,
JDBC
DB
Served every page,

same SQL database
Circa 2003
So far so good, but two areas to improve:

1. The growing member to member
connection graph
2. The ability to search those members
MEMBER CONNECTION GRAPH

Needed to live in-memory for top
performance
Used graph traversal queries not suitable for
the shared SQL database.
Different usage profile than other parts of site
MEMBER CONNECTION GRAPH

Needed to live in-memory for top
performance
Used graph traversal queries not suitable for
the shared SQL database.
Different usage profile than other parts of site
So, a dedicated service was created.
LinkedIns first service.
MEMBER SEARCH
Social networks need powerful search
Lucene was used on top of our member graph
MEMBER SEARCH
Social networks need powerful search
Lucene was used on top of our member graph
LinkedIns second service.
LINKEDIN WITH CONNECTION GRAPH

AND SEARCH
LEO
RPC
Member
Graph
Lucene
DB
Connection / Profile Updates
Circa 2004
Getting better, but the single database was

under heavy load.
Vertically scaling helped, but we needed to
offload the read traffic...
REPLICA DBs
Master/slave concept
Read-only traffic from replica
Writes go to main DB
Early version of Databus kept DBs in sync
Main DB
Databus
relay
Replica
Replica
Replica DB
REPLICA DBs TAKEAWAYS

Good medium term solution
We could vertically scale servers for a while
Master DBs have finite scaling limits
These days, LinkedIn DBs use partitioning
Main DB
Databus
relay
Replica
Replica
Replica DB
LINKEDIN WITH REPLICA DBs
RPC
LEO
R/O
R/W
Main DB
Member
Graph
Connection
Updates
Databus relay
Search
Profile
Updates
Replica
Replica
Replica
DB
Circa 2006
As LinkedIn continued to grow, the

monolithic application Leo was becoming
problematic.
Leo was difficult to release, debug, and the
site kept going down...
IT WAS TIME TO...
Kill LEO
SERVICE ORIENTED ARCHITECTURE

Extracting services (Java Spring MVC) from
legacy Leo monolithic application
Recruiter Web
App
Public Profile
Web App
LEO
Profile Service
Yet another
Service
Circa 2008 on
SERVICE ORIENTED ARCHITECTURE
Profile Web
App
Profile
Service
Goal - create vertical stack of

stateless services
Frontend servers fetch data
from many domains, build
HTML or JSON response
Mid-tier services host APIs,
business logic
Profile DB
Data-tier or back-tier services

encapsulate data domains
EXAMPLE MULTI-TIER ARCHITECTURE AT LINKEDIN

Browser / App
Frontend
Web App
Profile
Mid-tier
Content
Service
Service
Connections
Mid-tier
Content
Service
Service
Groups
Mid-tier
Content
Service
Service
Edu
Data
Data
Service
Kafka
Service
DB
Voldemort
Hadoop
SERVICE ORIENTED ARCHITECTURE COMPARISON
PROS
Stateless services
easily scale
CONS
Ops overhead
Decoupled domains
Introduces backwards
compatibility issues
Build and deploy

independently
Leads to complex call

graphs and fanout
SERVICES AT LINKEDIN
In 2003, LinkedIn had one service (Leo)
By 2010, LinkedIn had over 150 services
Today in 2015, LinkedIn has over 750 services
bash$ eh -e %%prod | awk -F. '{ print $2 }' | sort | uniq | wc -l

756
Getting better, but LinkedIn was

experiencing hypergrowth...
CACHING
Frontend
Web App
Mid-tier
Service
Cache
Cache
DB
Simple way to reduce load on

servers and speed up responses
Mid-tier caches store derived
objects from different domains,
reduce fanout
Caches in the data layer
We use memcache, couchbase,
even Voldemort

There are only two hard problems in
Computer Science:
Cache invalidation, naming things, and
off-by-one errors.
Via Twitter by Kellan Elliott-McCrea

and later Jonathan Feinberg
CACHING TAKEAWAYS
Caches are easy to add in the beginning, but
complexity adds up over time.
Over time LinkedIn removed many mid-tier
caches because of the complexity around
invalidation
We kept caches closer to data layer
CACHING TAKEAWAYS (cont.)

Services must handle full load - caches
improve speed, not permanent load bearing
solutions
Well use a low latency solution like
Voldemort when appropriate and precompute
results
LinkedIns hypergrowth was extending to

the vast amounts of data it collected.
Individual pipelines to route that data
werent scaling. A better solution was
needed...
KAFKA MOTIVATIONS
LinkedIn generates a ton of data
Pageviews
Edits on profile, companies, schools
Logging, timing
Invites, messaging
Tracking
Billions of events everyday
Separate and independently created pipelines
routed this data
A WHOLE LOT OF CUSTOM PIPELINES...
A WHOLE LOT OF CUSTOM PIPELINES...
As LinkedIn needed to scale, each pipeline

needed to scale.
KAFKA
Distributed pub-sub messaging platform as LinkedIns
universal data pipeline
Frontend
service
Frontend
service
Backend
Service
Kafka
DWH
Oracle
Monitoring
Analytics
Hadoop
KAFKA AT LINKEDIN
BENEFITS
Enabled near realtime access to any data source
Empowered Hadoop jobs
Allowed LinkedIn to build realtime analytics
Vastly improved site monitoring capability
Enabled devs to visualize and track call graphs
Over 1 trillion messages published per day, 10 million
messages per second
IL
Y
A
D
ILY
A
D
D
E
H
D
E
IS
H
L
B
IS
U
L
P
B
U
N
P
IO
ILLLION
RIL
R 11 TTR
O
ER
VE
OV
Lets end with
the modern years
REST.LI
Services extracted from Leo or created new
were inconsistent and often tightly coupled
Rest.li was our move to a data model centric
architecture
It ensured a consistent stateless Restful API
model across the company.
REST.LI (cont.)
By using JSON over HTTP, our new APIs
supported non-Java-based clients.
By using Dynamic Discovery (D2), we got
load balancing, discovery, and scalability of
each service API.
Today, LinkedIn has 1130+ Rest.li resources
and over 100 billion Rest.li calls per day
REST.LI (cont.)
Rest.li Automatic API-documentation
REST.LI (cont.)
Rest.li R2/D2 tech stack
LinkedIns success with Data infrastructure

like Kafka and Databus led to the
development of more and more scalable
Data infrastructure solutions...
DATA INFRASTRUCTURE
It was clear LinkedIn could build data
infrastructure that enables long term growth
LinkedIn doubled down on infra solutions like:
Storage solutions
Espresso, Voldemort, Ambry (media)
Analytics solutions like Pinot
Streaming solutions
Kafka, Databus, and Samza
Cloud solutions like Helix and Nuage
DATABUS
LinkedIn is a global company and was

continuing to see large growth. How else
to scale?
MULTIPLE DATA CENTERS

Natural progression of horizontally scaling
Replicate data across many data centers using
storage technology like Espresso
Pin users to geographically close data center
Difficult but necessary

Multiple data centers are imperative to
maintain high availability.
You need to avoid any single point of failure
not just for each service, but the entire site.
LinkedIn runs out of three main data centers,
additional PoPs around the globe, and more
coming online every day...
LinkedIn's operational setup as of 2015

(circles represent data centers, diamonds represent PoPs)
Of course LinkedIns scaling story is never

this simple, so what else have we done?
WHAT ELSE HAVE WE DONE?

Each of LinkedIns critical systems have
undergone their own rich history of scale
(graph, search, analytics, profile backend,
comms, feed)
LinkedIn uses Hadoop / Voldemort for insights
like People You May Know, Similar profiles,
Notable Alumni, and profile browse maps.
WHAT ELSE HAVE WE DONE? (cont.)

Re-architected frontend approach using
Client templates
BigPipe
Play Framework
LinkedIn added multiple tiers of proxies using
Apache Traffic Server and HAProxy
We improved the performance of servers with
new hardware, advanced system tuning, and
newer Java runtimes.
Scaling sounds easy and quick to do, right?

Hofstadter's Law: It always takes longer
than you expect, even when you take
into account Hofstadter's Law.
Via Douglas Hofstadter,

Gdel, Escher, Bach: An Eternal Golden Braid
THANKS!
Josh Clemm
www.linkedin.com/in/joshclemm
LEARN MORE
Blog version of this slide deck
https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin
Visual story of LinkedIns history

https://ourstory.linkedin.com/
LinkedIn Engineering blog

https://engineering.linkedin.com
LinkedIn Open-Source
https://engineering.linkedin.com/open-source
LinkedIns communication system slides which

include earliest LinkedIn architecture http://www.slideshare.
net/linkedin/linkedins-communication-architecture
Slides which include earliest LinkedIn data infra work

http://www.slideshare.net/r39132/linkedin-data-infrastructure-qcon-london-2012
LEARN MORE (cont.)

Project Inversion - internal project to enable developer
productivity (trunk based model), faster deploys, unified
services
http://www.bloomberg.com/bw/articles/2013-04-10/inside-operation-inversion-the-codefreeze-that-saved-linkedin
LinkedIns use of Apache Traffic server

http://www.slideshare.net/thenickberry/reflecting-a-year-after-migrating-to-apache-trafficserver
Multi Data Center - testing fail overs

https://www.linkedin.com/pulse/armen-hamstra-how-he-broke-linkedin-got-promotedangel-au-yeung
LEARN MORE - KAFKA

History and motivation around Kafka
http://www.confluent.io/blog/stream-data-platform-1/
Thinking about streaming solutions as a commit log

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineershould-know-about-real-time-datas-unifying
Kafka enabling monitoring and alerting

http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
Kafka enabling real-time analytics (Pinot)

http://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot
Kafkas current use and future at LinkedIn

http://engineering.linkedin.com/kafka/kafka-linkedin-current-and-future
Kafka processing 1 trillion events per day

https://engineering.linkedin.com/apache-kafka/how-we_re-improving-and-advancingkafka-linkedin
LEARN MORE - DATA INFRASTRUCTURE

Open sourcing Databus
https://engineering.linkedin.com/data-replication/open-sourcing-databus-linkedins-lowlatency-change-data-capture-system
Samza streams to help LinkedIn view call graphs

https://engineering.linkedin.com/samza/real-time-insights-linkedins-performance-usingapache-samza
Real-time analytics (Pinot)

http://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot
Introducing Espresso data store

http://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-newdistributed-document-store
LEARN MORE - FRONTEND TECH

LinkedIns use of client templates
Dust.js
http://www.slideshare.net/brikis98/dustjs
Profile
http://engineering.linkedin.com/profile/engineering-new-linkedin-profile
Big Pipe on LinkedIns homepage

http://engineering.linkedin.com/frontend/new-technologies-new-linkedin-home-page
Play Framework
Introduction at LinkedIn https://engineering.linkedin.
com/play/composable-and-streamable-play-apps
Switching to non-block asynchronous model

https://engineering.linkedin.com/play/play-framework-async-io-without-thread-pooland-callback-hell
LEARN MORE - REST.LI

Introduction to Rest.li and how it helps LinkedIn scale
http://engineering.linkedin.com/architecture/restli-restful-service-architecture-scale
How Rest.li expanded across the company

http://engineering.linkedin.com/restli/linkedins-restli-moment
LEARN MORE - SYSTEM TUNING

JVM memory tuning
http://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-highthroughput-and-low-latency-java-applications
System tuning
http://engineering.linkedin.com/performance/optimizing-linux-memory-managementlow-latency-high-throughput-databases
Optimizing JVM tuning automatically

https://engineering.linkedin.com/java/optimizing-java-cms-garbage-collections-itsdifficulties-and-using-jtune-solution
WERE HIRING
LinkedIn continues to grow quickly and theres
still a ton of work we can do to improve.
Were working on problems that very few ever
get to solve - come join us!

How Linkedin Scaled

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How Linkedin Scaled

Uploaded by

Copyright:

Available Formats

SCALING LINKEDIN

Scaling = replacing all the components

Via Mike Krieger, Scaling Instagram

LinkedIn started back in 2003 to

First week growth guesses from founding team

Fast forward to today...

LINKEDIN SCALE TODAY

How did we get there?

Lets start from

LINKEDINS ORIGINAL ARCHITECTURE

Huge monolithic app

Served every page,

So far so good, but two areas to improve:

MEMBER CONNECTION GRAPH

MEMBER CONNECTION GRAPH

LinkedIns second service.

LINKEDIN WITH CONNECTION GRAPH

Connection / Profile Updates

Getting better, but the single database was

REPLICA DBs TAKEAWAYS

LINKEDIN WITH REPLICA DBs

As LinkedIn continued to grow, the

IT WAS TIME TO...

SERVICE ORIENTED ARCHITECTURE

SERVICE ORIENTED ARCHITECTURE

Goal - create vertical stack of

Data-tier or back-tier services

EXAMPLE MULTI-TIER ARCHITECTURE AT LINKEDIN

SERVICE ORIENTED ARCHITECTURE COMPARISON

Build and deploy

Leads to complex call

bash$ eh -e %%prod | awk -F. '{ print $2 }' | sort | uniq | wc -l

Getting better, but LinkedIn was

Simple way to reduce load on

Via Twitter by Kellan Elliott-McCrea

CACHING TAKEAWAYS (cont.)

LinkedIns hypergrowth was extending to

A WHOLE LOT OF CUSTOM PIPELINES...

A WHOLE LOT OF CUSTOM PIPELINES...

As LinkedIn needed to scale, each pipeline

Lets end with

the modern years

Rest.li Automatic API-documentation

Rest.li R2/D2 tech stack

LinkedIns success with Data infrastructure

LinkedIn is a global company and was

MULTIPLE DATA CENTERS

MULTIPLE DATA CENTERS

MULTIPLE DATA CENTERS

LinkedIn's operational setup as of 2015

Of course LinkedIns scaling story is never

WHAT ELSE HAVE WE DONE?

WHAT ELSE HAVE WE DONE? (cont.)

Scaling sounds easy and quick to do, right?

Via Douglas Hofstadter,

Visual story of LinkedIns history

LinkedIn Engineering blog

LinkedIns communication system slides which

Slides which include earliest LinkedIn data infra work

LEARN MORE (cont.)

LinkedIns use of Apache Traffic server

Multi Data Center - testing fail overs

LEARN MORE - KAFKA

Thinking about streaming solutions as a commit log