Professional Documents
Culture Documents
www.doodle.com
Agenda
1. Introduction to NoSQL
2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle
Agenda
1. Introduction to NoSQL
2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle
NoSQL stands for no SQL or not only SQL Describes alternatives to Current relational data models Common database technologies like transactions Classic scalability approaches
Schema-free or only weak restrictions Designed with horizontal scalability in mind Good support for data redundancy and high availability Provide simple API
Database provides ACID properties for transactions Atomicity Consistency Isolation Durability But applications for example often use lower isolation levels for higher performance Yet, widely used for web applications
Brewers CAP theorem1 introduced a new way of thinking CAP stands for Consistency Availability Partition Tolerance (Node failures)
Any distributed database can at most fulfill two of these properties at the same time proven by Gilbert and Lynch2
1) http://www.julianbrowne.com/article/viewer/brewers-cap-theorem 2) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdf
NoSQL databases trade availability and failure tolerance for consistency BASE is a counterpart to the classic ACID consistency model Stands for Basically Available Soft State Eventually Consistent NoSQL typically optimize availability Consistency only comes second
Agenda
1. Introduction to NoSQL
2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle
Terminology
Data is stored in BSON format BSON stands for Binary JSON Maximum document size is 4MB
Relational database
PostID 1 2 UserID 6 7 Comment Awesome Well done
MongoDB
{ _id: ObjectId(43asd...), userId: 6, comment: Awesome } {
Data types
Data type ObjectId String Integer Float Boolean Array Doc Date DBRef Example {_id:ObjectId(4caf56gdet43hb764fd3df3s)} {_id:ObjectId(4c), name : Hans} {_id:ObjectId(4c), temperature : 36} {_id:ObjectId(4c), temperature : 26.5} {_id:ObjectId(4c), rainy : false} {_id:ObjectId(4c), fibunacci : [0,1,1,2,3,5]} {_id:ObjectId(4c), car : {color:red, doors:5}} {_id:ObjectId(4c), "date : ISODate("2011-09-15T22: 00: 00.0Z) } {_id:ObjectId(4c), carOwner: { $ref : owners, $id : ObjectId(z5) } }
MongoDB supports special collections and indexes Indexes similar to RDBMS indexes set on one or more attributes of a BSON document Capped collections Can only hold a certain number of documents Very fast for sequential access Used for logging, caching, etc.
Sharding process of splitting up data storing different portions of the data on different machines
Autosharding
Cluster handles splitting up data an rebalancing automatically Single collections are broken into smaller chunks Sharding is transparent for the application
New secondaries initialize automatically No need to restore backups or copy data files Clients can be provided with a list of node IPs, not only one server Client can still connect if one server is down A replica set can contain different MongoDB versions Individual nodes can be taken down for software updates and added back into the set
Agenda
1. Introduction to NoSQL
2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle
MongoDB uses JSON/BSON extensively MongoDB also provides a fully-featured JavaScript interpreter JavaScript can be executed on the server Can also be stored on the server (system.js) Is used in some database commands (e.g. Map/Reduce) In 2009 they switched from Google V8 to Spider Monkey1 Extended with MongoDB specific functions
1) http://blog.mongodb.org/post/101474817/migrating-to-spider-monkey
Official Java driver has a complicated, verbose syntax In 2010 Scott Hernandez started the Morphia project tries to be for the MongoDB Java driver what JPA is for JDBC
A type-safe java library for MongoDB Runs on top of the official driver
Agenda
1. Introduction to NoSQL
2. MongoDB Basics
3. Interacting with MongoDB 5. MongoDB at Doodle
Documents based on JSON format We are working with JSON data structure in our apps
Indexable JSON keys Heavily used at Doodle Better API than CouchDB Not everything has to be done using Map/Reduce
Lessons learned
Generally really fast reads and quite fast writes Replication works super easy Really only any good in 64bit mode MongoDB and Morphia are not documented well In-depth documentation hard to find or not available Map/Reduce only seems to be easy... Hard to debug
Lessons learned: Dont forget your indexes Non indexed random accesses are slow just as with any traditional RDBMS 10 DB connections sufficient for a busy site like Doodle
But limit is reached pretty quickly if queries take to long due to missing indexes
MongoDB supports database profiling1 to help debugging queries similar to MySQL slow query log
1) http://www.mongodb.org/display/DOCS/Database+Profiler
When documents are inserted and removed randomly, the data files remain partially occupied (fragmentation) and thus the disk space cannot be freed MongoDB does not defragment the data files automatically
Workaround: Remove secondary from replica set Delete data file and add back into set Alternative: db.repairDatabase();1 (offline)
1) http://www.mongodb.org/display/DOCS/Durability+and+Repair
THANK YOU