You are on page 1of 55

NoSQL is Dead

Eric Redmond
@coderoshi

Terminology

Follow Along at Home

basho

basho
(btw, were hiring)

All models are wrong


but some are useful.
G.E.P. Box

Models that arent


useful should die.
Me

NoSQL is not a
useful model.
the premiss

Sorry

NoSQL is ill-defined

NoSQL is a bad classifier

What this means for you

NoSQL is ill-defined

A NoSQL (often interpreted as


Not Only SQL) database
provides a mechanism for storage
and retrieval of data that is
modeled in means other than the
tabular relations used in relational
databases.
http://en.wikipedia.org/wiki/NoSQL

NAASQL
Pronounced: Nazgl

People desire NoSQL because


of some perceived or actual
deficiency in the SQL model

Perceived Deficiencies

Cant horizontally scale

Cant support semi-structured data

Slower development

Cant use modern tools (like MR/Hadoop)

Actual Deficiencies

Terrible in representing complex graphs

High Availability

Require indexes for speed

Distributed SQL is often a subset, or has caveats

SQL isnt always required or helpful

NoSQL is a
Bad Classifier

DB Classifications

By Models (Graph, Doc, Col, KV)

By Network Topology (Mesh, Partial-Mesh, Tree)

By Natural Distribution (RDB/Graph, Doc/Col/KV)

Classify by Models

Graph

Columnar

Key/Value

Document

Graph

http://en.wikipedia.org/wiki/Graph_database

Graph Modeling Kit

Cypher
START x=node(0)
MATCH x
RETURN x.name

Graph Stores

Neo4j (high perf, ACID)

HypergraphDB (directed hypergraph)

Titan (distributed)

ArangoDB (flexible modeling)

SparkleDB (RDF, SPARQL)

InfinityDB (distributable, embeddable)

Key/Value Stores

Riak (critical data, simple operations)

Aerospike (specialized for SSD+DRAM)

Redis (speed, fancy datatypes, messaging)

LevelDB (embeddable)

Column Store
row keys

w
o
r

w
o
r

"a key"

"a key"

column family

column family

column: "value"
column: "value"
column: "value"

column: "value"

column: "value"
column: "value"

Columnar Stores

Cassandra (random access; Dynamo, CQL)

HBase (ordered, sparse; Big Table)

Accumulo (compressed)

Hypertable (HQL, auto-migration)

Document Datastore
{
"_id" :"2612672603",
"_rev" : "4db7ca268e236e5bf9a52224",
"name" : "Sant Juli de Lria",
"country" : "AD",
"Umezone" : "Europe/Andorra",
"populaUon" : 8022,
"locaUon" : {
"laUtude" : 42.46372,
"longitude" : 1.49129
}
}

Document Stores

CouchDB (embedable, replication)

Couchbase (distributed, failover)

MongoDB (easy to program)

RethinkDB (distributed joins, atomic updates)

KV/Doc/Col Too Limited

Riak + Search (Document)

Cassandra (K/V)

ArangoDB (Document)

PostgreSQL (Columnar -> Hadoop)

ElasticSearch (Inverted index?)

Model Types: Revised

Key/Value (with or without indexing)

Graph

Other

Classify by Topology

Single Node

Mesh Network

Partial Mesh Network

Tree/Star Topologies

Single-Node

Neo4j (graph)

LevelDB (key/value)

HBase (columnar)

CouchDB (document)

Why Distribution?

Sharding (distributing a subset of a class of data


across multiple servers)

Replication (duplicating data across multiple


servers)

The CAP Theorem


A topic so boring I took a nap in the
middle of writing this slide

http://aphyr.com/posts/313-strong-consistency-models

Harvest/Yield Tradeoff
You cant guarantee 100% harvest and 100% yield

FLP Impossibility Proof


Safety or Liveness, but not both

Synchronicity is not
the only problem

Synchronicity is not
the only problem
Intention is the problem

Mesh Networks

Riak

HBase

Couchbase

BigCouch

Partial Mesh

Riak + MDC

HBase

Cassandra

Tree/Star

Mongo

HBase

PostgreSQL/MySQL cluster

What about Topology?

CouchDB (Single, Mesh [BigCouch])

MongoDB (Single, Tree)

Riak (Mesh, Partial Mesh)

HBase (Single, Mesh, Partial Mesh, Tree)

PostgreSQL (Single, Tree)

OceanBase (Partial Mesh)

Classify by Natural
Distribution

Hard to distribute

Graphs

Relational Joins

Easy to distribute

Key/values

Hard, but not Impossible

Titan (distributed graph)

InfinityDB (distributed graph)

VoltDB (distributed SQL database)

PostgreSQL cluster (distributed SQL database)

Easy, but Not Always There

Redis (KVs are easy to distribute, but Redis


Cluster sucks)

MongoDB (Document can distribute, but Mongo


tends to be tough to scale/admin)

What else?

Time series, FTS, Ranges

Defined Schema / Schemaless / Opaque binary

Large object storage

HA/SC, Harvest/Yield

Message patterns (req/rep, pub/sub)

Stream processing, Data stream mining

Developer friendliness / Operational simplicity, Self healing

What this means


for you

The Future

Polyglot DBs with Data Oriented Middleware


(hint, its not a thing. Please make it)

Jessica & Dans keynote

This is the where you all learn a secret

Who else builds this? You!

#emotionalAppeal

CAP Theorem: http://webpages.cs.luc.edu/~pld/


353/gilbert_lynch_brewer_proof.pdf

Harvest/Yield: http://radlab.cs.berkeley.edu/people/
fox/static/pubs/pdf/c18.pdf

SC Models: http://aphyr.com/posts/313-strongconsistency-models

FLP Impossibility: http://the-paper-trail.org/blog/abrief-tour-of-flp-impossibility/

RAMP transactions: http://www.bailis.org/blog/


scalable-atomic-visibility-with-ramp-transactions/

7 DBs 7 Weeks: https://pragprog.com/book/rwdata/


seven-databases-in-seven-weeks

You might also like