Professional Documents
Culture Documents
Software
July 8
th
, 2014
Email: jedberg@{gmail,netix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
When your software fails...
will your system survive?
The Netix way
Application startup
Conguration
Code deployment
System
deployment
Automate all the things!
Reduce errors
through
reproducibility
Automation
Shared state should
be stored in a
shared service
Data on an instance
should be replicated
to other instances
Build for three
We hold a boot camp for new
engineers to teach them how to
build for a highly distributed
environment.
Build for three
We hold a boot camp for new
engineers to teach them how to
build for a highly distributed
environment.
>?< "@5A"@*6
.%B@%,5, C%. 6(D
5" :-3
6%C%*6%*E$%,
!"#$%
'()*+,
-%.,"*(/$0()"*
1*+$*%
2,%. 3*4"
!"#$%
!%5(6(5(
7$8$/(.
!"#$%,
'%#$%9,
:;< =%,5
1*+$*%
?< .%B@%,5, C%.
6(D
$*5" 5F% G%H/$I
:-3
Discovery
API
Streaming
API
!"#$%
'()*+,
-%.,"*(/$0()"*
1*+$*%
2,%. 3*4"
!"#$%
!%5(6(5(
7$8$/(.
!"#$%,
'%#$%9,
:;< =%,5
1*+$*%
Discovery
API
Streaming
API
Content
Encoding
CDN
Management
QOS
Logging
DRM
OpenConnect
Edge Locations
Browse
Play
Watch
Easier auto-scaling
Simulate things
that go wrong
Instances
Machine Images
Elastic IPs
Load Balancers
Service Oriented
Architecture
HTTP/Rest interfaces
between services
Netix built a global PaaS
Netix OSS
Multiple accounts
Simulate things
that go wrong
Latency -- Degrades
network and injects
faults
Conformity -- Looks
for outliers
The simian army
Global variables
Feature ags
Netix OSS
We know Java
Priam
State management
Token assignment
Node replacement
Backup/restore to/from S3
Using Cassandra at Netix
Astyanax
OO abstraction
to Cassandra
Multi-region
support
Cassandra Architecture
Going Multi-region
Peaks at 1.4B /
min
Developers choose
what metrics to
submit
What to alert on
Example Alert Cong
Atlas
When something breaks..
Breakdown of an outage
Is something wrong? Alerting
Where is the problem? Telemetry and Dashboards
What changed? ???
Breakdown of an outage
Is something wrong? Alerting
Where is the problem? Telemetry and Dashboards
What changed? Change control?
Change control, the good
Its manual
It forces you to
serialize your changes
to an extent
Breakdown of an outage
Is something wrong? Alerting
Where is the problem? Telemetry and Dashboards
What changed? Chronos
(Some of) Netix is open source:
https://netix.github.io
Just a quick reminder...
Netix is hiring!
If you like what you see here,
feel free to reach out!
Questions?
Getting in touch
Email: jedberg@{gmail,netix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg