You are on page 1of 34

NoSQL

William Horner, Chris Little, Brandon


Bowen

Overview

What is NoSQL
History of NoSQL
Differences from SQL
Handling the Lack of Joins
ACID
CAP Theorem/Brewers Theorem
Types of NoSQL
Benchmarks
Why NoSQL
Sharding
When to Use
Application Uses
Amazon Company Example

What is noSQL
not

only SQL
not using formal structure
Is joinless
Database without the limitation of SQL rules
Uses CAP Theorem/Brewers Theorem to improve
consistency

History of NoSql
Started with rational databases
Growing need to handle larger amounts of data with
realtime performance
First termed as NoRel, No Relational, by Carlo Strozzi in
1998
Reintroduced in 2009 as NoSql, Not Only SQL, by Eric
Evans to discuss open-source non-rational database
systems

ompanies that use noSQL

Cisco
Dell
Best Buy
Call of Duty
Fed Ex
NASA
Netflix

Apache Cassandra
Project
Scalability and high availability without compromising
performance
Uses column indexes
Denormalization
Materialized Views
Built-in caching

Apache Cassandra Project


Used in over 1500 companies with large, active data sets
Largest cluster has 300 TB of data on over 400 machines
Replication across multiple data centers allows failed nodes to be
replaced with no downtime
Every node is identical, allowing no single point of failure
Users can choose between synchronous and asynchronous
replication

Benefits

Schemas are dynamic


Scaling is easier and more cost efficient
Data Manipulation conforms to the language a program
uses

Differences from SQL

Handling the Lack of Joins


Multiple Queries
Caching
Nesting data

ACID

Atomicity- transactions must be all or nothing if a failure occurs


then nothing happens
Consistency- Data must follow any rules including constraints,
triggers, and cascades
Isolation- determines how transaction integrity is visible to other
users and systems(how you allocate the rights to data)
Durability- a transaction that has been committed will remain so

CAP Theorem/Brewers Theorem


Consistency
Availability
Partition Tolerance

Challenges of NoSQL
Maturity - In comparison RDBMS systems have been
around for a long time. Most NoSQL alternatives are in
pre-production versions with many key features yet to be
implemented.
Support - Most NoSQL systems are Open Source projects,
and the companies that offer support are small start-ups
without global reach, support services, or the credibility of
Oracle, Microsoft, or IBM.

Challenges of NoSQL
Analytics and Business Intelligence - NoSQL databases
have evolved to meet the scaling demands of Web 2.0
applications.
Administration - The design goals for NoSQL is to provide
a zero-admin solution, but as of today it requires a lot of
skill to install and a lot of to effort to maintain.
Expertise - Almost all NoSQL developers is learning how to
use and develop for NoSQL

Types of NoSQL
Document DataBase
Graph Stores
Key-value stores
Wide-column stores

Document DataBase
MongoDB
CouchDB
RethinkDB
SequoiaDB
RavenDB
NeDB
AmisaDB
JasDB
RaptorDB
djonDB

Graph Stores
Neo4j
Infinite Graph
Sparksee
TITAN
InfoGrid
HyperGraphDB
GraphBase
Trinity
AllegroGraph

Key-value stores
DynamoDB
Azure Table Storage
Riak
Redis
Aerospike
LevelDB
BerkeleyDB
Oracle NoSQL Database
GenieDB

Wide-column stores
Hbase
MapR/Hortonworks/Cloudera
Cassandra
Hypertable
Accumulo
Amazon SimpleDB
Cloudata
MonetDB
HPCC
Apache Flink

Why noSQL
Performance
Scalability
high availability
auto-scaled

Benchmark Table
Data Model

Performance

Scalability

Flexibility

Complexity

Functionality

KeyValue
Store

high

high

high

none

variable
(none)

ColumnOriented
Store

high

high

moderate

low

minimal

DocumentOriented
Store

high

variable
(high)

high

low

variable (low)

Graph
Database

variable

variable

high

high

graph theory

Relational
Database

variable

variable

low

moderate

relational
algebra

Other Important Terms


Denormalization - optimizing read performance by adding
redundant data or grouping data in order to improve scalability
and performance
does NOT mean that the data has not been normalized
Denormalization should ideally take place after 3NF has been
achieved
Constraints are used to ensure that redundant copies of data
are synchronized
Materialized View - a database object that contains the results
of a query.
query result is cached but can be updated from the original
query as necessary

Other Important Terms


Keyspace - object that holds together all column families of a
design
outermost grouping of data in datastore
resembles a schema in RDMS
Column Families - tuple (pair) consisting of a key-value pair,
where the key is set to a value that is a set of columns
object that contains columns of related data
resembles a table in RDMS

Other Important Terms


Super Column Family - tuple (pair) that consists of key-value
pair, where the key is mapped to a value that are column families
similar to a view in RDBS
Column (data store) - tuple (triplet) key-value pair consisting of
a unique name, a value, and a timestamp.
the timestamp determines old data from new data
not to be confused with a standard relational database column
lowest level object in a keyspace

Other Important Terms


Database Shard - a horizon partition in a database or a search
partition. Each partition is a separate shard.
shards can be distributed to separate hardware, reducing the
number of rows in each table
not to be confused with horizontal partitioning, which
refers to splitting one or more tables by rows within a single
schema or database server
Sharding - the process of forming shards within the distributed
database system.
traditionally done by hand coding
auto-sharding code is highly sought after

Application Uses
Session Store
User Profile Store
Content and Metadata Store
Mobile Applications
Third-Party Data Aggregation
High Availability Cache
Globally Distributed Data Repository
E-Commerce
Social Gaming
Ad Targeting

Sample Language of
NoSql(MongoDB)

Sample Language of NoSql(MongoDB


Cont.)

Amazon DynamoDB
Cloud based NoSQL service
Supports Document based and Key Value data models
Stores 3 geographically distributed replicas of each table to
enable high availability and data durability
May not be fully real time, supports Eventually Consistent
Reads by default, but can also support Strongly Consistent
Reads

Amazon DynamoDB data model


Tables: these differ from relational databases by using data
objects
Item: has a main key value and can also have many
attributes
Attributes: have a name and one or more values
Data: objects have a size limit of 400kb

Global Secondary Indexes


Allows a secondary index that can be searched for instance
a zip code
Does not have to be unique
Allows for rapid searches for groups based on the index

Summary
Due to the growing need of large amounts of data and realtime performance, we see the increasing need for NoSQL.
With many variances, we find the costs and benefits from
each of the different styles of NoSQL
We see the compliment NoSQL gives when complimented
with cloud computing.

Questions
?
Tough!

You might also like