You are on page 1of 30

MongoDB scalable, high-performance, open source, documentoriented database

www.doodle.com

Agenda
1. Introduction to NoSQL

2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle

Agenda
1. Introduction to NoSQL

2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle

NoSQL is a trend, not a database

NoSQL stands for no SQL or not only SQL Describes alternatives to Current relational data models Common database technologies like transactions Classic scalability approaches

NoSQL to the rescue

Schema-free or only weak restrictions Designed with horizontal scalability in mind Good support for data redundancy and high availability Provide simple API

Different consistency model (BASE)

SQL databases are all about consistency

Database provides ACID properties for transactions Atomicity Consistency Isolation Durability But applications for example often use lower isolation levels for higher performance Yet, widely used for web applications

Brewers CAP theorem1 introduced a new way of thinking CAP stands for Consistency Availability Partition Tolerance (Node failures)

Any distributed database can at most fulfill two of these properties at the same time proven by Gilbert and Lynch2

1) http://www.julianbrowne.com/article/viewer/brewers-cap-theorem 2) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdf

NoSQL databases trade availability and failure tolerance for consistency BASE is a counterpart to the classic ACID consistency model Stands for Basically Available Soft State Eventually Consistent NoSQL typically optimize availability Consistency only comes second

Agenda
1. Introduction to NoSQL

2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle

MongoDB is one of the stars in the NoSQL sky


Aims at closing the gap between relational databases and key/value stores Provides Document store without schema restrictions Sophisticated query interface Good scalability and performance Drivers for many programming languages Does not support Transactions Many common SQL features (e.g. joins)

MongoDB is already in widespread use

and many more...

Terminology

MySQL Database Table Row Column

MongoDB Database Collection Document Node of Document

Data is stored in BSON format BSON stands for Binary JSON Maximum document size is 4MB

Comparison of relational and MongoDB data models

Relational database
PostID 1 2 UserID 6 7 Comment Awesome Well done

MongoDB
{ _id: ObjectId(43asd...), userId: 6, comment: Awesome } {

_id: ObjectId(ljkh56...), userId: 7, comment: Well done


}

Data types
Data type ObjectId String Integer Float Boolean Array Doc Date DBRef Example {_id:ObjectId(4caf56gdet43hb764fd3df3s)} {_id:ObjectId(4c), name : Hans} {_id:ObjectId(4c), temperature : 36} {_id:ObjectId(4c), temperature : 26.5} {_id:ObjectId(4c), rainy : false} {_id:ObjectId(4c), fibunacci : [0,1,1,2,3,5]} {_id:ObjectId(4c), car : {color:red, doors:5}} {_id:ObjectId(4c), "date : ISODate("2011-09-15T22: 00: 00.0Z) } {_id:ObjectId(4c), carOwner: { $ref : owners, $id : ObjectId(z5) } }

MongoDB supports special collections and indexes Indexes similar to RDBMS indexes set on one or more attributes of a BSON document Capped collections Can only hold a certain number of documents Very fast for sequential access Used for logging, caching, etc.

MongoDB supports replication...

Why replication? High availability Performance Backups Batch processing

MongoDB uses replica sets Primary-secondary cluster Automatic failover

Very easy to set up

... as well as auto sharding

Sharding process of splitting up data storing different portions of the data on different machines

Autosharding
Cluster handles splitting up data an rebalancing automatically Single collections are broken into smaller chunks Sharding is transparent for the application

... and even Map/Reduce

Map/Reduce is the Uzi of aggregation tools.

A way of processing large data sets in parallel


Map/Reduce is a two step process Map step Transform a document into an intermediate representation Reduce step Combine multiple intermediate values into one intermediate values Typical examples Process and aggregate statistics

Replica set internals

New secondaries initialize automatically No need to restore backups or copy data files Clients can be provided with a list of node IPs, not only one server Client can still connect if one server is down A replica set can contain different MongoDB versions Individual nodes can be taken down for software updates and added back into the set

Agenda
1. Introduction to NoSQL

2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle

MongoDB: JavaScript built in

MongoDB uses JSON/BSON extensively MongoDB also provides a fully-featured JavaScript interpreter JavaScript can be executed on the server Can also be stored on the server (system.js) Is used in some database commands (e.g. Map/Reduce) In 2009 they switched from Google V8 to Spider Monkey1 Extended with MongoDB specific functions
1) http://blog.mongodb.org/post/101474817/migrating-to-spider-monkey

RockMongo is a MongoDB administration GUI tool, written in PHP

MongoDB can be used with a lot of programming languages


MongoDB offers drivers for a large variety of languages C, C# and .NET, C++, Erlang, Haskell, Java, Javascript, Perl, PHP, Python, Ruby, Scala, REST, ActionScript, Clojure, Coldfusion, D, Go, Delphi, Groovy, Lua, Objective C, MatLab, Smalltalk,

Official Java driver has a complicated, verbose syntax In 2010 Scott Hernandez started the Morphia project tries to be for the MongoDB Java driver what JPA is for JDBC

A type-safe java library for MongoDB Runs on top of the official driver

Agenda
1. Introduction to NoSQL

2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle

Why did we1 chose MongoDB

Easy to install, setup and to understand

Documents based on JSON format We are working with JSON data structure in our apps
Indexable JSON keys Heavily used at Doodle Better API than CouchDB Not everything has to be done using Map/Reduce

1) Actually Paul Sevinc, CTO at Doodle AG

Lessons learned

Generally really fast reads and quite fast writes Replication works super easy Really only any good in 64bit mode MongoDB and Morphia are not documented well In-depth documentation hard to find or not available Map/Reduce only seems to be easy... Hard to debug

Lessons learned: Dont forget your indexes Non indexed random accesses are slow just as with any traditional RDBMS 10 DB connections sufficient for a busy site like Doodle

But limit is reached pretty quickly if queries take to long due to missing indexes
MongoDB supports database profiling1 to help debugging queries similar to MySQL slow query log
1) http://www.mongodb.org/display/DOCS/Database+Profiler

Lessons learned: Space is not freed when collections shrink


MongoDB allocates new space when needed

When documents are inserted and removed randomly, the data files remain partially occupied (fragmentation) and thus the disk space cannot be freed MongoDB does not defragment the data files automatically
Workaround: Remove secondary from replica set Delete data file and add back into set Alternative: db.repairDatabase();1 (offline)
1) http://www.mongodb.org/display/DOCS/Durability+and+Repair

THANK YOU

You might also like