NoSQL is Dead

NoSQL is Dead
Eric Redmond
@coderoshi
Terminology
Follow Along at Home
basho
basho
(btw, were hiring)
All models are wrong

but some are useful.
G.E.P. Box
Models that arent

useful should die.
Me
NoSQL is not a
useful model.
the premiss
Sorry
NoSQL is ill-defined
NoSQL is a bad classifier
What this means for you
NoSQL is ill-defined
A NoSQL (often interpreted as

Not Only SQL) database
provides a mechanism for storage
and retrieval of data that is
modeled in means other than the
tabular relations used in relational
databases.
http://en.wikipedia.org/wiki/NoSQL
NAASQL
Pronounced: Nazgl
People desire NoSQL because

of some perceived or actual
deficiency in the SQL model
Perceived Deficiencies
Cant horizontally scale
Cant support semi-structured data
Slower development
Cant use modern tools (like MR/Hadoop)
Actual Deficiencies
Terrible in representing complex graphs
High Availability
Require indexes for speed
Distributed SQL is often a subset, or has caveats
SQL isnt always required or helpful
NoSQL is a
Bad Classifier
DB Classifications
By Models (Graph, Doc, Col, KV)
By Network Topology (Mesh, Partial-Mesh, Tree)
By Natural Distribution (RDB/Graph, Doc/Col/KV)
Classify by Models
Graph
Columnar
Key/Value
Document
Graph
http://en.wikipedia.org/wiki/Graph_database
Graph Modeling Kit
Cypher
START x=node(0)
MATCH x
RETURN x.name
Graph Stores
Neo4j (high perf, ACID)
HypergraphDB (directed hypergraph)
Titan (distributed)
ArangoDB (flexible modeling)
SparkleDB (RDF, SPARQL)
InfinityDB (distributable, embeddable)
Key/Value Stores
Riak (critical data, simple operations)
Aerospike (specialized for SSD+DRAM)
Redis (speed, fancy datatypes, messaging)
LevelDB (embeddable)
Column Store
row keys
w
o
r
w
o
r
"a key"
"a key"
column family
column family
column: "value"
column: "value"
column: "value"
column: "value"
column: "value"
column: "value"
Columnar Stores
Cassandra (random access; Dynamo, CQL)
HBase (ordered, sparse; Big Table)
Accumulo (compressed)
Hypertable (HQL, auto-migration)
Document Datastore
{
"_id" :"2612672603",
"_rev" : "4db7ca268e236e5bf9a52224",
"name" : "Sant Juli de Lria",
"country" : "AD",
"Umezone" : "Europe/Andorra",
"populaUon" : 8022,
"locaUon" : {
"laUtude" : 42.46372,
"longitude" : 1.49129
}
}
Document Stores
CouchDB (embedable, replication)
Couchbase (distributed, failover)
MongoDB (easy to program)
RethinkDB (distributed joins, atomic updates)
KV/Doc/Col Too Limited
Riak + Search (Document)
Cassandra (K/V)
ArangoDB (Document)
PostgreSQL (Columnar -> Hadoop)
ElasticSearch (Inverted index?)
Model Types: Revised
Key/Value (with or without indexing)
Graph
Other
Classify by Topology
Single Node
Mesh Network
Partial Mesh Network
Tree/Star Topologies
Single-Node
Neo4j (graph)
LevelDB (key/value)
HBase (columnar)
CouchDB (document)
Why Distribution?
Sharding (distributing a subset of a class of data

across multiple servers)
Replication (duplicating data across multiple

servers)
The CAP Theorem

A topic so boring I took a nap in the
middle of writing this slide
http://aphyr.com/posts/313-strong-consistency-models
Harvest/Yield Tradeoff
You cant guarantee 100% harvest and 100% yield
FLP Impossibility Proof

Safety or Liveness, but not both
Synchronicity is not
the only problem
Synchronicity is not
the only problem
Intention is the problem
Mesh Networks
Riak
HBase
Couchbase
BigCouch
Partial Mesh
Riak + MDC
HBase
Cassandra
Tree/Star
Mongo
HBase
PostgreSQL/MySQL cluster
What about Topology?
CouchDB (Single, Mesh [BigCouch])
MongoDB (Single, Tree)
Riak (Mesh, Partial Mesh)
HBase (Single, Mesh, Partial Mesh, Tree)
PostgreSQL (Single, Tree)
OceanBase (Partial Mesh)
Classify by Natural
Distribution
Hard to distribute
Graphs
Relational Joins
Easy to distribute
Key/values
Hard, but not Impossible
Titan (distributed graph)
InfinityDB (distributed graph)
VoltDB (distributed SQL database)
PostgreSQL cluster (distributed SQL database)
Easy, but Not Always There
Redis (KVs are easy to distribute, but Redis

Cluster sucks)
MongoDB (Document can distribute, but Mongo

tends to be tough to scale/admin)
What else?
Time series, FTS, Ranges
Defined Schema / Schemaless / Opaque binary
Large object storage
HA/SC, Harvest/Yield
Message patterns (req/rep, pub/sub)
Stream processing, Data stream mining
Developer friendliness / Operational simplicity, Self healing
What this means

for you
The Future
Polyglot DBs with Data Oriented Middleware

(hint, its not a thing. Please make it)
Jessica & Dans keynote
This is the where you all learn a secret
Who else builds this? You!
#emotionalAppeal
CAP Theorem: http://webpages.cs.luc.edu/~pld/

353/gilbert_lynch_brewer_proof.pdf
Harvest/Yield: http://radlab.cs.berkeley.edu/people/
fox/static/pubs/pdf/c18.pdf
SC Models: http://aphyr.com/posts/313-strongconsistency-models
FLP Impossibility: http://the-paper-trail.org/blog/abrief-tour-of-flp-impossibility/
RAMP transactions: http://www.bailis.org/blog/

scalable-atomic-visibility-with-ramp-transactions/
7 DBs 7 Weeks: https://pragprog.com/book/rwdata/

seven-databases-in-seven-weeks

NoSQL is Dead - Why the Model is Ill-Defined

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NoSQL is Dead - Why the Model is Ill-Defined

Uploaded by

Copyright:

Available Formats

Follow Along at Home

All models are wrong

Models that arent

NoSQL is a bad classifier

What this means for you

A NoSQL (often interpreted as

People desire NoSQL because

Cant horizontally scale

Cant support semi-structured data

Cant use modern tools (like MR/Hadoop)

Terrible in representing complex graphs

Require indexes for speed

Distributed SQL is often a subset, or has caveats

SQL isnt always required or helpful

By Models (Graph, Doc, Col, KV)

By Network Topology (Mesh, Partial-Mesh, Tree)

By Natural Distribution (RDB/Graph, Doc/Col/KV)

Graph Modeling Kit

Neo4j (high perf, ACID)

HypergraphDB (directed hypergraph)

ArangoDB (flexible modeling)

SparkleDB (RDF, SPARQL)

InfinityDB (distributable, embeddable)

Riak (critical data, simple operations)

Aerospike (specialized for SSD+DRAM)

Redis (speed, fancy datatypes, messaging)

Cassandra (random access; Dynamo, CQL)

HBase (ordered, sparse; Big Table)

Hypertable (HQL, auto-migration)

CouchDB (embedable, replication)

Couchbase (distributed, failover)

MongoDB (easy to program)

RethinkDB (distributed joins, atomic updates)

KV/Doc/Col Too Limited

Riak + Search (Document)

PostgreSQL (Columnar -> Hadoop)

ElasticSearch (Inverted index?)

Model Types: Revised

Key/Value (with or without indexing)

Partial Mesh Network

Sharding (distributing a subset of a class of data

Replication (duplicating data across multiple

The CAP Theorem

FLP Impossibility Proof

What about Topology?

CouchDB (Single, Mesh [BigCouch])

MongoDB (Single, Tree)

Riak (Mesh, Partial Mesh)

HBase (Single, Mesh, Partial Mesh, Tree)

PostgreSQL (Single, Tree)

OceanBase (Partial Mesh)

Hard, but not Impossible

Titan (distributed graph)

InfinityDB (distributed graph)

VoltDB (distributed SQL database)

PostgreSQL cluster (distributed SQL database)

Easy, but Not Always There

Redis (KVs are easy to distribute, but Redis

MongoDB (Document can distribute, but Mongo

Time series, FTS, Ranges

Defined Schema / Schemaless / Opaque binary