Case Study: Using MongoDB For An E-Commerce Platform

CASE STUDY:
USING MONGODB FOR AN E-COMMERCE PLATFORM
VERSION 1.0: JULY 19, 2011 AUTHOR: HENNIE GROBLER (HEGROBLER@GMAIL.COM)
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Overview.......................................................................................................................................... 4 Scope.......................................................................................................................................... 4 Sources....................................................................................................................................... 5 System Definition............................................................................................................................. 6 Use Cases................................................................................................................................... 6 Constraints and assumptions:......................................................................................................7 Define the Schema........................................................................................................................... 7 Identify System Operations.......................................................................................................... 8 Identify Entities and Fields...........................................................................................................9 MongoDb Best Practices and Considerations................................................................................10 Entity Relationships................................................................................................................... 10 Size of Data............................................................................................................................... 10 Indexing .................................................................................................................................... 10 Adding indexes...................................................................................................................... 11 Filter Criteria..................................................................................................................... 11 Sorting.............................................................................................................................. 11 Considerations...................................................................................................................... 11 Query Optimization................................................................................................................ 12 Sharding.................................................................................................................................... 12 Automatic Sharding............................................................................................................... 12 Sharding Key......................................................................................................................... 12 Considerations...................................................................................................................... 13 Using the _id (or date based data) as the shard key.........................................................13 Read / Write Ratio............................................................................................................ 13 Related Data..................................................................................................................... 14 Unique Keys..................................................................................................................... 14 Result Order..................................................................................................................... 14 Bringing it all together..................................................................................................................... 15 Entities....................................................................................................................................... 15 Product.................................................................................................................................. 15 Category............................................................................................................................... 16 User...................................................................................................................................... 16 Shopping Cart....................................................................................................................... 17 Actions....................................................................................................................................... 18 Search for product based on SKU.........................................................................................18
Page 2 of 32
Search for products by product name...................................................................................18 Search for products by category identifier.............................................................................18 Increment / decrement stock item.........................................................................................19 Add / Edit products................................................................................................................ 20 Create Shopping Cart............................................................................................................ 21 Problem............................................................................................................................ 21 Define the correct shard key.............................................................................................21 Split read and write data...................................................................................................22 Add / Remove products to / from shopping cart.....................................................................22 Pay for cart by credit card.....................................................................................................22 Search for all categories........................................................................................................23 Search for products less than reorder threshold....................................................................23 Search for sub-categories by category identifier....................................................................24 Search total product value.....................................................................................................26 Search cart total per date......................................................................................................26 Discard Shopping Cart.......................................................................................................... 27 Infrastructure.................................................................................................................................. 28 Deployment................................................................................................................................ 28 Mongo Processes................................................................................................................. 29 Replica Sets[12].................................................................................................................... 29 Operating System...................................................................................................................... 29 RAM........................................................................................................................................... 29 Network..................................................................................................................................... 30 Next Steps...................................................................................................................................... 30 References..................................................................................................................................... 31
Page 3 of 32
Overview
MongoDb garnered much attention over the last couple of years. It is said to be fast and reliable and that it automates some of the processes that are usually very time consuming and error prone. Adoption seems to be growing steadily as it is being used in more and more, high transaction volume systems like Foursquare, Bit.ly and Sourceforge. MongoDb seemed like the 'way to go' but then some reports of down time surfaced as was the case with Foursquare (MongoDB Auto-sharding and Foursquare Downtime[21]) and I realised that it is not a 'quick fix' solution that can be applied to all scenarios. Financial systems seemed to be the most unsuitable type of application to use with a MongoDb back-end. I am still not 100% convinced that MongoDb can be used with all types of financial systems, especially not banking systems, but I believe that it may be suitable for most e-commerce systems. I found the following factors to be most obvious issues with starting a MongoDb implementation: Schema Design: The schema design used for MongoDb and MySql implementations are vastly different but because developers are generally used to designing for relational databases they are prone to make some bad design decisions. Sharding: MongoDb has many built-in features that reduce the operational procedures that must be in place, but not understanding how these features work could cause some serious system problems. Experience: MongoDb is a relatively new technology compared to its relational counterparts like MySql which means that there is an equally limited amount of experienced MongoDb developers and administrators in the field. This document tries to solve the above mentioned issues somewhat, by providing an overall overview of an imaginary e-commerce system built on MongoDb, instead of the numerous disjointed examples found on the internet.
Scope
The document covers the creation of the data schema for the e-commerce system, and provides an overview of the infrastructure and some of the operational procedures that must be in place to get started with a MongoDb implementation. It does not however discuss the actual e-commerce website implementation.
Page 4 of 32
We will assume that the system has a limited amount of functionality as defined in subsequent sections. This will provide a set of parameters for the use case and avoid an overly complex design that could be confusing and therefore hide some of the learning's that can be taken away from it.
Sources
This document is based on theoretical knowledge of the topic but all statements, conclusions and examples therein is based on information found on the MongoDb site, other use cases and various blogs that are freely available on the internet. All sources are noted at the end of the document. It is recommended that these additional resources also be assimilated in order to get the maximum benefit from this document.
Page 5 of 32
System Definition
Based on what we have been taught about relational database design there is only one correct design for a given problem. The approach would normally be to analyse the data, identify all the prominent entities that are represented by the data, create a table for each and then create the appropriate relationships between the tables. Once all of the data normalization (sometimes de-normalization) rules have been applied the design was done. With MongoDb databases this process differs slightly as the data schema cannot be designed without first evaluating what the system will do with the data.
Use Cases
The system will be limited to the following use cases: A user can 1. register on the site 2. log in on the site with username (email) and password 3. view products from a specific category 4. search the product list based on the name of the product 5. view a specific product 6. add n number of different products to a shopping cart 7. remove products from a shopping cart 8. can discard a shopping cart 9. can pay for a shopping cart by credit card The system must 10. track product stock levels An accountant can view the following reports: 11. Total daily, monthly and yearly income earned from online sales ordered by date 12. Total value of stock on hand An inventory clerk can: 13.Add / Edit Products 14.Set inventory stock level order threshold per product (When an order must be placed otherwise shop will run out of stock)
Page 6 of 32
Constraints and assumptions:

Email addresses are unique Product Identifiers are unique A user must be logged in to be able to make a payment Shopping Cart Limited to 500 line items. Each line item will be a unique product. It a product is added to the cart that already exists, then the original order item quantity will increase with the amount of the new line item A category can be a sub-category of another category Passwords saved to the database must be made up of a cryptographic hash of the password with an added salt value (random value) The stock levels of a product is only adjusted when new stock is added to the inventory and only removed once an item is added to a cart and that cart is successfully paid The system supports user roles (User, Accountant, Inventory Clerk) where each role has access to different functionality
Define the Schema

This case study will use the following steps to identify the final data schema: 1. Identify the operations that the system need to support, based on the system functionality 2. Identify the entities that the operations 'interact' with 3. Identify meta-data of the entities 4. View how the entities are used in the system in relation to one another 5. Bring it all together by using the findings from the first four steps and applying some best practice rules to them
Page 7 of 32
Identify System Operations

The following actions were identified based on the previously defined functionality and is ordered into probability of a possible usage scenario. Order is for demonstration only and may vary depending on actual implementation. The table also shows which system function the action relates to, the type of operation and which potential entities and fields were identified. Action 1 2 3 4 5 6 7 8 9 Search for product based on SKU Search for all categories Search for products by category identifier Search identifier Search for products by product name Create shopping cart Add / remove products to shopping cart Pay for cart by credit card Increment / decrement stock item 4 6 6,7 9 11 2 Read Write Write Write Write Read for sub-categories by category System Function 5 3 3 3 Read Read Read Read Product (SKU) Category Product / Category (id) Category (id, parent_id) Product (name) Cart Cart (line items) Cart, Payment (credit card info) Product (items_in_stock) 10 Find user by email (not by password as well as salt must be returned to calculate correct password) 11 Save new / existing user (similar to Add / remove products from shopping cart so will be discarded) 12 Add / Edit products 13 Search for products less than reorder threshold 14 Search cart total per date (ordered) 15 Search total product value 16 Discard shopping cart 12 13 8 Read Read Delete 14 15 Write Read Product Product (reorder_threshold) Cart (date, total) Product (cost_price) Cart 1 Read User User (email, password, salt) Type Subject(s) / Fields
Page 8 of 32
Identify Entities and Fields

The previous section identified the different system entities and also identified some fields. We will now expand on this by reviewing the constraints and assumptions. We will also add some additional attributes that will probably be required by a real system to make this example more complete. Entity Product Category Cart Cart Line Item Payment User Fields name, SKU, cost_price, selling_price, items_in_stock, reorder_threshold id, parent_id, list of products date, total product info, quantity credit card info firstname, lastname, email, password, salt, shipping address, role (user, accountant, inventory clerk)
Page 9 of 32
MongoDb Best Practices and Considerations
Entity Relationships
Each of the entities will most probably be modelled as individual tables in a relational database but this may not necessarily be the case with a MongoDb database. One of the biggest factors in deciding how the data is modelled depends on how the entities are accessed in relation to one another. For example, if an invoice and its line items are always accessed together then it would be better for performance to model them as one entity. Alternatively if line items are regularly accessed individually, then it would probably be better to model them as separate entities. For example, based on the current use case we will model the Shopping Cart and Cart Line Items as one document.
Size of Data
The maximum size of a document in MongoDb is currently limited to 8 MB but a maximum size of 32 MB has been proposed and this will probably increase even further in future. It may sound like good idea to store very large objects in a document but consider that the whole document must travel across the network between the database server and the application server when it is accessed. In cases where only part of the document is accessed each time it is retrieved it would be less resource intensive if the document is split into smaller documents.
Indexing
Adding indexes to your collections could significantly increase the query performance as MongoDb can quickly navigate the index to find the relevant document by key instead of scanning each document in the collection. The following shows a simplified depiction of how the system is able to navigate the index to find the relevant information (in this case the user with the surname of Straub) without having to scan each and every document in the collection.
Page 10 of 32
King
Harris
Rice
Bachman
Graham
Koontz
Straub
MongoDB automatically creates an index on the _id column but additional indexes can be added as required.
Adding indexes
Filter Criteria
The fields that indexes are applied to depend on the queries that are completed. In our use case the system will 'Search for products based on SKU' so we can therefore define an index on the SKU field of the document.
Sorting
Based on the 'Search cart total per date (ordered)' system action we would also need to add an index on the date as the query is sorted by date. Adding an index on the field that is sorted on enables MongoDb to sort the data without having to open each document.
Considerations
The following must be taken into consideration when applying indexes: Additional Overhead: Values are added / removed from an index whenever documents are added/removed to/from the collection. This does not pose a problem in systems that do mostly read operations but in write heavy systems this may incur significant overhead as the index must be continuously updated. Initial Index Blocking: No queries can be done against the database when the index is first applied except when using {background:true} option[9]. Page 11 of 32
Case Sensitive: MongoDb indexes are case sensitive Indexes per Collection: There is a limit of 40 indexes per collection. In most cases this number is more than sufficient. Index Key Size: Currently a maximum key length that can be indexed, is 800 bytes.
Query Optimization
As with applying indexes on a relational database, you sometimes get unexpected results so it is good practice to verify that the query uses the intended index and that using the index actually results in better performance. This can be done by examining the query execution plan by issuing the explain()[10] command.
Sharding
Automatic Sharding
MongoDb supports automatic sharding[1] where data is automatically spread out across multiple servers in order to distribute the transaction load. The system accomplishes this by storing data in multiple files (called chunks[2]) across multiple servers. Each chunk can be up to a maximum of 200 MB in size by default but can be overridden to be larger. Once a chunk reaches approximately 50%-75% (100 MB to 150 MB) of the maximum size, MongoDb will create a snapshot of the chunk and copy the snapshot data to the new chunk. Writes can still be done to the original chunk while this copy operation is in process. Once the copy process is completed, the changes made to the original chunk will be applied to the new chunk before it is made available.
Sharding Key
Mongo Db uses a key called a shard key to decide to which chunk, data will be allocated. The shard key will by default be based on the _id column that is made up of a BSON object (see BSON ObjectId Specification[3]) but this can be overridden by user code to consist of any user defined value. A shard key for user document could for example be based on the user last name. With that in mind imagine that we have three chunks with user data. The first chunk may contain all the users that have a surname starting with B to H, the second Ki to Ko and the third chunk R to S.
Page 12 of 32
Bachman, Richard Harris, Thomas
King, Stephen Koontz, Dean
Rice, Anne Straub, Peter
If a user with a last name of Barker is added, it will be written to the first chunk where a user with a last name of Smith will be written to the last chunk.
Considerations
Deciding on the correct shard key may be one of the most significant design decisions that are made during the design process as it could have a major impact, positive or negative, on system performance. The following are some considerations to note.
Using the _id (or date based data) as the shard key
MongoDb automatically adds an _id attribute to each document (if not overridden by application code) and populates it with a unique value (see BSON ObjectId Specification [3]). The BSON object consist of a couple of values that are concatenated together to form a (relatively) unique value. The first part of this unique value is calculated based on the current date and time. This could be an advantage as data is automatically stored in date order which would increase performance of queries that query data by date range or need to order results by date. This fact can also be exploited in other ways. For example most drivers support extracting the creation date and time from the _id which means that storing a 'created at' value in the document is not required. On the other hand, based on the MongoDb website it could also have some implications on scalability. At the beginning of each month documents will be written to the same server until the data chunks are migrated across to other servers. This issue can mitigated by adding some uniqueness to the key and pre-splitting chunks[7].
Read / Write Ratio

The read / write ratio that the system will experience must also be carefully considered. If the system experiences many reads it would be better for performance if the whole query can be satisfied from one shard and preferably one document. Alternatively if the system experiences many writes it would be better if the shard keys are defined in such a way that the writes are distributed between multiple servers in order to spread the workload. This can be achieved by adding more uniqueness to the shard key. Page 13 of 32
If the system experience exceptionally many writes then the way that the MongoDb balancer handles the splitting of chunks could also become an issue as described in the 'MongoDB PreSplitting for Faster Data Loading and Importing'[8] article.
Related Data
Keeping related data close together will improve system performance as all the data can be retrieved from one chunk or shard. In a system with lots of user related content we may prefix the shard key with the user id. We could 'force' the system to store different documents containing user related information like personal data, uploaded media and purchase history close together by prefixing each document _id with the particular user id.
Unique Keys
Shard keys should normally be as unique as possible. MongoDb can only shard data if the key can be split into smaller parts. Depending on the system, there may some performance issues that start appearing once chunks start to grow past the default 200 MB maximum size. For example using State (eg. Texas and Ohio) as the shard key for user related data may cause some problems in the future as MongoDb will have to write data for ALL users that live in a particular state to the same chunk and because it cannot split the chunk it would grow to be very large. If the key is changed to include City it would allow MongoDb to create a chunk for each State+City combination which allows for a lot more granularity. If it is also considered that each State+City chunk is potentially stored on a different server and that some cities have more users than others, it becomes clear that some servers will experience higher loads than others.
Result Order
The order in which search results are returned to the client can also affect the selection of an appropriate shard key. Continuing with the State / City example let us imagine that we defined a shard key of {state:1,city:1} on our data and that the relevant data returned by a query is stored on multiple servers. If the query returns data ordered by city, each server will need to compile the search results and then sort the data. The data is then returned from each server and then the results are merged into one by the mongos process (See Deployment section). The extra sorting step has to be completed as there is not an index defined on the city column alone but on the
Page 14 of 32
combination of State+City. If the query on the other hand sorts by state or state+city then each server will compile the data and stream it back in order to the mongos process without having to sort and merge the results as it will be able to utilise the defined index.
Bringing it all together

After reviewing the system functionality as well as some of the best practices and considerations we are able to create our document schemas and define the queries that will be run against the system.
Entities
Based on the 'Identify Entities and Fields' section we can assume that the documents would resemble the following samples. The structure and content of these documents may change further as the different actions are considered in the following section.
Product
Each product document will have the following structure and will be allocated to the products collection. Categories will also be stored in the product document but will be discussed in detail in a subsequent section.
Collection: products { "_id": ObjectId("4e1b091559a4f01109000000"), "name": "Ipad", "sku": "10001-23424-9098", "cost_price": 300, "selling_price": 320, "items_in_stock": 9, "reorder_threshold": 10 } { "_id": ObjectId("4e1b08e159a4f01608000000"), "name": "Ipod Nano", "sku": "10001-23424-9098", "cost_price": 100, "selling_price": 120, "items_in_stock": 10,
Page 15 of 32

"reorder_threshold": 15 }
Category
Category documents will be allocated to the categories collection and will not be sharded as all the category documents will make up a relatively small amount of data. We will also override the default generated _id as it is very long. The reason for this will be explained later on. Categories will fortunately not be updated often which means that the performance hit of using a custom incremental _id for categories, is acceptable
Collection: categories { "_id": "1", "name": "Electronics", "subcats": [2, 3] } { "_id": "2", "name": "Cellular", "parents": [1], "subcats": [3] } } { "_id": "3", "name": "Nokia", "parents": [1, 2 ] } }
User
User documents will be allocated to their own collection called users
Collection: users { "_id": ObjectId("4e1bfba789a4f02207000000"), "firstname" : "John", "lastname" : "Doe", "email" : "john@gmail.com", "password" : "[encrypted_text]", "password_salt" : "[salt_text]" "shipping_address" : {
Page 16 of 32

"address1" : "33 Rainbow Road", "city" : "Cape Town", "postal_code" : "8000" }, "role" : "user" }
Shopping Cart
The shopping cart, products in the cart and the payment made for the cart will always be queried together which means that the data can be stored as one document. Each of the line items will become an array item in the document. Some of the product data was duplicated into the cart object which prevents additional database lookups when completing actions like previewing the cart or generating an invoice or even reprinting an invoice a year after it was paid for. The payment details and some of the user details will also be stored in the document.
Collection: cart { "_id": ObjectId("4e1bfba559a4f02207000000"), "line_items": [{ "_id": "1_4e1b091559a4f01109000000", "cost_price": 300, "name": "Ipad", "selling_price": 320, "sku": "10001-23424-9098", "qty": 2 }, { "_id": ObjectId("4e1b08e159a4f01608000000"), "cost_price": 100, "name": "Ipod Nano", "selling_price": 120, "sku": "10001-23424-9098", "qty": "1", }], "payment": { "card_number": "[encrypted_text]", "expiry": "11\/12", "card_holder": "Mr J Doe" },
Page 17 of 32

"sales_date": "2011-07-12 09:45:36", "total": 760, "user": { "id": ObjectId("4e1bfba789a4f02207000000"), "name" : "John Doe", "email" : "john@gmail.com", "shipping_address" : { "address1" : "33 Rainbow Road", "city" : "Cape Town", "postal_code" : "8000" }, "role" : "user" } }
Actions
The actions are not ordered as defined in the 'Identify System Operations' section as some of the discussions build one previous ones. Note: All of the following examples refer to the document examples defined in the 'Entities' section unless otherwise specified.
Search for product based on SKU

Add an index on the SKU field of the product document
Search for products by product name

Add an index on the name field of the product document
Search for products by category identifier

Based on one of the best practices it is better to combine all the information related to a specific entity into one document so that the system can satisfy the query without having to retrieve multiple documents. That would suggest that we save all of the products into the specific category document. In a normal e-commerce system we will have hundreds or thousands of products over time which will result in very large documents.
Page 18 of 32
We could opt to model the product and category entities as separate documents which means that these documents should somehow reference each other. In our design we will add the category _id to the product document like this:
{ "_id": ObjectId("4e1b091559a4f01109000000"), "name": "Ipad", .... "category" : 10 }
We could then add an index on the category column in order to quickly find all products in a particular category. We could alternatively embed the whole category document in the category field if required. This approach would take more disk space because of the duplicated data but if the category data needs to be displayed on the front end with category information it could prevent an extra query to the database. This may only be an option if the category information is relatively static. In cases where a product can belong to a multiple categories we could use an array of category id's.
{ "_id": ObjectId("223b091559a4f01109000000"), "name": "Nokia", .... "categories": ["1": "2"] } }
Querying for a specific value in an array field is supported by MongoDb with the Multikey feature[13].
Increment / decrement stock item

The items_in_stock field will in essence be a counter that is incremented or decremented when an item is added to stock or sold. In this case the system does not need to return the document to the client. The system is able to increment / decrement the document in place. Updating a document[14] will normally take this form:
var product = prodCollection.findOne({_id: 4e1b091559a4f01109000000}); product.items_in_stock++; prodCollection.save(product);
Page 19 of 32
But we can use a modifier [15] which is much more efficient and can be used for atomic updates [16] on the document. We will most probably query for a product by _id which automatically has an index defined on it. Use the following to increment the items_in_stock without retrieving the whole document (note the $inc operator):
db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : 1}});
or the following to decrement the stock level:
db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : -1}});
Add / Edit products

The most important aspect when editing data is deciding on the shard key as this will influence which shard the data will be written to and how the data will be located during queries. Adding and editing products will not happen that often in comparison to other types of transactions which means that the default _id should be sufficient to be used as the shard key. But considering that searching for products by category is a high volume transaction we could concatenate the category _id to the product id so that all products in a category are grouped together as shown in the following example:
{ } {
"_id": "1", "name": "Electronics"
"_id": ObjectId("1_4e1b091559a4f01109000000"), "name": "Ipad", ....

"category" : "1"
Another side effect of pre-pending the category for systems where a product can only belong to one category, is that we potentially do not have to store the category as a separate field as it can Page 20 of 32
be extrapolated from the product _id.
Create Shopping Cart

Problem
As with the product entity, careful consideration is required when deciding what the shopping cart shard key will consist of. In a high transaction volume environment there will be a tremendous amount of writes completed as new shopping carts are created and items are added and removed. Then once the cart is paid, it will be mostly read from for reporting purposes etc. This makes it difficult as applying indexes for example, will allow for fast retrieval of the data after payment but will hurt performance while the purchase is in progress. Also choosing a shard key related to date will allow for better querying of the data but will be dangerous as it could mean that all writes will be done to the same shard instead of being spread out over many shards. Completing regular data intensive queries for reports etc. could also hurt system performance and potentially affect the user experience.
Define the correct shard key

In our case we will avoid using the default generated _id as it will cause excessive writes to one server at some times during the month. A similar issue was described in the last paragraph of the 'Using the _id or date based data as the shard key' section. There are many different ways to generate a unique number that can be used for the _id of your document. Most approaches combine a couple of values to get a unique value. In some systems it may be sufficient to concatenate the user id and the date. We could even be more inventive and use the application server name that the transaction was generated on or even use the hexadecimal representation of the user's IP address [19] (eg. IP 196.134.96.111 = hex C4 86 60 6F) to help make values unique. In our use case we will keep it simple and use a GUID [20] for a unique key. We could also pre-split chunk[7] data if necessary. This will ensure that write operations to the cart is distributed across many shards. Adding and removing line items to/from the cart can be done most efficiently by using the $push and $pull[14] modifiers that will add items to the document in place. Because an index is automatically added to the _id field finding the documents by _id will also be fast. And lastly, once a cart is paid we can use the $set[14] modifier to add payment details. Page 21 of 32
Split read and write data

Data volumes in this collection will eventually grow very large and may affect performance. This is especially true if it is considered that reporting queries and other search queries will be done against the same database. We will therefore use two collections, one for 'active' carts and another for 'completed' carts. The active cart collection will have no additional indexes in order to cater for the frequent updates whereas the completed cart collection will have more indexes to cater for the different search queries. Moving data between collections will cause extra overhead on the system so we will split this processing into different parts. We will assume that real time (or as close to as possible) reporting is required which means that we cannot use a deferred job that will move the data during a low transaction volume period like midnight to 3 AM. There are various approaches but we will go with a more complex option in order to demonstrate some MongoDb less well knows features. When a cart is paid and payment details are saved we will add an additional field called 'processed' using the $set[14] modifier. This field will have a sparse[5] index defined on it. Sparse indexes only include documents that contain the field that the index is defined on. A separate server process will query the database at intervals and retrieve all the documents that have a 'processed' field in the document. Because of the sparse index it will be a very efficient query and will not affect write queries as only documents containing the 'processed' field will be included in the index. These documents will be retrieved and saved into the second collection and once the document is moved the 'processed' field will be removed from the original. Because the field is removed that document will not be returned on subsequent 'data move' queries. Care needs to be taken to ensure that this both updates happen in an atomic[16] fashion. A third process will be run during low transaction volume period. This job will remove all documents from the first collection that exist the second.
Add / Remove products to / from shopping cart

See 'Create shopping cart' section.
Pay for cart by credit card

See 'Create shopping cart' section.
Page 22 of 32
Search for all categories

Due to the static nature of the category data it would most probably be cached on the client application server instead of being queried for continuously. This means that nothing additional would be required for this query except possibly an index on the category name if the result of the 'Search all categories' query must be sorted by name.
Search for products less than reorder threshold

Finding documents where the value of a field is less than another value can be completed with the first query below but MongoDb does not support using the $lt modifier with a column name yet, as shown in the second query.
> db.products.find({items_in_stock: {$lt:20}}) { "_id" : ObjectId("4e1b091559a4f01109000000"), "items_in_stock" : 9, "name" : "Ipad", "reorder_threshold" : 10 } > db.products.find({items_in_stock: {$lt:reorder_threshold}}) Mon Jul 11 14:43:28 ReferenceError: reorder_threshold is not defined (shell):0
We are able to make use of a mapreduce[17] function though. In this use case the query will access all the product documents in the collection, it does not have any filter criteria and does not require sorting which makes it a good option for map-reduce The following example is adapted from the 'Finding Max And Min Values for a given Key' article[18]. Based on the example data (Entities section) the result is expected to look like this:
{ _id { _id : "1_497ce4051ca9ca6d3efca323", : "1_678ce4051ca9ca6d3efca323",
value : { product : { name : Ipod Nano , items_below_level : 5 } } } value : { product : { name : Ipad , items_below_level : 1 } } }
Explaining map / reduce is out of scope of this document but suffice it to say that the functions are applied to each document. Our map function would check whether the items in stock for a particular product, are below the set threshold, and if it is, it will emit the value. The reduce function will normally be used to aggregate values (eg. sums, counts and averages) but in our case not, so the function just returns the result.
> map = function () { if (this.items_in_stock < this.reorder_threshold) {
Page 23 of 32

var x = {name:this.name, items_below_level:(this.reorder_threshold this.items_in_stock)}; emit(this._id, {product:x}); }} > reduce = function (key, values) { return values[0];}
Running the mapReduce command will have the following output:

> db.products.mapReduce(map, reduce, {out:{inline : true}}); { "result" : "tmp.mr.mapreduce_1310385961_11", "timeMillis" : 5, "counts" : { "input" : 2, "emit" : 2, "output" : 2 }, "ok" : 1, } > db.tmp.mr.mapreduce_1310385961_11.find() { "_id" : ObjectId("4e1add3b59a4f0d213000000"), "value" : { "product" : { "name" : "Ipod Nano", "items_below_level" : 5 } } } { "_id" : ObjectId("4e1add5c59a4f04906000000"), "value" : { "product" : { "name" : "Ipad", "items_below_level" : 1 } } }
Search for sub-categories by category identifier

As mentioned under 'Search for all categories', the category / sub-category hierarchy will most probably be retrieved, calculated and cached in some form or another on the client side which means that we do not need to make any changes to accommodate this query. In a scenario where caching is not possible, we could use a map-reduce [17] function to return the appropriate category hierarchy. In our use case assume that categories have sub-categories and that sub-categories can have their own sub-categories as shown in the example data. We can use the following map-reduce functions to retrieve the data in the appropriate category hierarchy.
> map = function () {
Page 24 of 32

var key = {id:this._id, name:this.name}; if (!this.subcats) { var value = {subcats:['none']}; emit(key, value); } else { for (var i = 0; i < this.subcats.length; i++) { var value = {subcats:[this.subcats[i]]}; emit(key, value); } } } > reduce = function (key, values) { var result = {subcats:[]}; for (var i = 0; i< values.length; i++) { result.subcats = values[i].subcats.concat(result.subcats); } result.subcats = result.subcats.sort(); return result; } > db.categories.mapReduce(map, reduce,{out:{inline : true}}); { "result" : "tmp.mr.mapreduce_1310454989_43", "timeMillis" : 2, "counts" : { "input" : 3, "emit" : 4, "output" : 3 }, "ok" : 1, } > db.tmp.mr.mapreduce_1310454989_43.find() { "_id" : { "id" : "1", "name" : "Electronics" }, "value" : { "subcats" : [ 2, 3 ] } } { "_id" : { "id" : "2", "name" : "Cellular" }, "value" : { "subcats" : [ 3 ] } } { "_id" : { "id" : "3", "name" : "Nokia" }, "value" : { "subcats" : [ "none" ] } }
Page 25 of 32
Search total product value

As mentioned, aggregating multiple values is another use for mapreduce [17] and in this use case we need to query the system and find the sum total of the cost_price of all the products that are in stock. The following shows how this can be achieved:
> map = function () { emit("sub_total", this.items_in_stock * this.cost_price); } > reduce = function (key, values) { var grand_total = 0; for (var i = 0; i < values.length; i++) { grand_total += values[i]; } return grand_total; } > db.products.mapReduce(map, reduce, {out:{inline : true}}); { "result" : "tmp.mr.mapreduce_1310392963_13", "timeMillis" : 3, "counts" : { "input" : 2, "emit" : 2, "output" : 1 }, "ok" : 1, } > db.tmp.mr.mapreduce_1310392963_13.find() { "_id" : "sub_total", "value" : 4200 }
Search cart total per date

As with relational databases it is sometimes feasible to store aggregated data. To satisfy this query we will add an extra field to the document called 'total' and populate it with the cart total during the data move step described in the 'Create a shopping cart' section we will then use a map-reduce query to group the totals together by date.
> map = function () {
Page 26 of 32

var millis = Date.parse(this.sales_date.substr(0,10)); var sales_dt = new Date(millis); var key = sales_dt.getFullYear().toString() + '- ' + sales_dt.getMonth().toString(); var value = this.total; emit(key, value); } > reduce = function (key, values) { var result = 0; for (var i = 0; i< values.length; i++) { result += values[i] } return result; } > db.cart.mapReduce(map, reduce,{out:{inline : true}}); { "result" : "tmp.mr.mapreduce_1310471886_57", "timeMillis" : 4, "counts" : { "input" : 2, "emit" : 2, "output" : 1 }, "ok" : 1, } > db.tmp.mr.mapreduce_1310471886_57.find() { "_id" : "2011-6", "value" : 1080 }
Discard Shopping Cart

We do not have to make and changes to accommodate this query as we will simply find the document by one of the indexed fields and then remove the document from the collection. This will not happen often so we do not have to worry about the overhead of maintaining indexes.
Page 27 of 32
Infrastructure
Deployment
Based on the MongoDb documentation[11] we will start with a setup as shown in the following diagram. This setup ensures that queries are distributed across multiple shards which improves performance, it ensures that there are three replicas of the data available (each of the servers in the replica set[12]) and it allows for disaster recovery scenarios by replicating to servers in another data centre.
Page 28 of 32
Mongo Processes
The bulk of MongoDb processing is handled by the processed depicted in the diagram. Mongod is the main database process. It completes the actual querying and editing of the data contained in the database. Mongos on the other hand is a only a routing service. A client application will communicate with the mongos process which in turn will query the configuration store (config mongod in the diagram) to find out which shard(s) to communicate with. It will then route the query to the appropriate shard(s) and merge the results from the different shards where applicable, before it returns the combined result to the client application. This method ensures that the client application only needs to be aware of one process to communicate with and does not have to have intimate knowledge of all the mongod processes. Note that the mongos processes can be run in many different configurations. It can be installed on all of the servers or only on some. It can also be installed on separate servers with no mongod processes installed. There may be a performance boost if the service is installed on each server as it will be able to communicate over the localhost interface.
Replica Sets[12]
A replica set consists of two or more servers with the mongod process installed. One server in a replica set will be 'nominated' as master and will service all read and write requests. If the master fails or becomes unavailable the slave will automatically become the master and start serving requests.
Operating System
MongoDb uses memory-mapped files to manage data which means that the database size is limited to 2 GB on 32-bit operating systems. Use a 64-bit operating system to support databases over 4 TB.
RAM
MongoDb uses memory-mapped files to manage data which allows it to map data in memory as it appears on the hard disk. MongoDb will keep data in memory once it is queried for the first time (if possible) and use the in memory data for subsequent queries which is more efficient than reading from disk. Having a lot of memory available could speed up queries significantly as the whole Page 29 of 32
database could potentially be loaded into memory.
Network
Setting up replication and backups will increase network traffic which could affect the query performance. Adding an extra network card and creating a separate network on which the servers can communicate with replication and backup servers could also reduce network 'noise'.
Next Steps
Page 30 of 32
References
1. Sharding: http://www.mongodb.org/display/DOCS/Sharding+Introduction 2. Chunk: 3. http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroductionChunks 4. BSON Object: http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification 5. Choosing a Shard Key: http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key 6. Indexing: http://www.mongodb.org/display/DOCS/Indexes 7. MongoTips: http://mongotips.com/b/a-few-objectid-tricks/ 8. Splitting Chunks: http://www.mongodb.org/display/DOCS/Splitting+Chunks 9. MongoDB Pre-Splitting for Faster Data Loading and Importing: http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-andimporting/ 10. Indexing as a Background Operation: http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation 11. Explain: http://www.mongodb.org/display/DOCS/Explain 12. Simple Initial Sharding Architecture: http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture 13. Replica Sets: http://www.mongodb.org/display/DOCS/Replica+Sets 14. Multikeys: http://www.mongodb.org/display/DOCS/Multikeys 15. Update: http://www.mongodb.org/display/DOCS/Updating#Updating-update%28%29 16. Modifiers: http://www.mongodb.org/display/DOCS/Updating#Updating-ModifierOperations 17. Atomic Operations: Page 31 of 32
http://www.mongodb.org/display/DOCS/Atomic+Operations 18. Map Reduce Basics: http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/ 19. Finding Max And Min Values for a given Key: http://cookbook.mongodb.org/patterns/finding_max_and_min_values_for_a_key/ 20. Calculate the hex value of an IP address: http://www.pocketnes.org/hexa.html 21. GUID: http://en.wikipedia.org/wiki/Globally_unique_identifier 22. MongoDB Auto-sharding and Foursquare Downtime: http://nosql.mypopescu.com/post/1251523059/mongodb-auto-sharding-and-foursquaredowntime
Page 32 of 32

Case Study: Using MongoDB For An E-Commerce Platform

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Case Study: Using MongoDB For An E-Commerce Platform

Uploaded by

Copyright:

Available Formats

CASE STUDY:

USING MONGODB FOR AN E-COMMERCE PLATFORM

VERSION 1.0: JULY 19, 2011 AUTHOR: HENNIE GROBLER (HEGROBLER@GMAIL.COM)

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Constraints and assumptions:

Define the Schema

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Identify System Operations

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Identify Entities and Fields

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

MongoDb Best Practices and Considerations

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Bachman, Richard Harris, Thomas

King, Stephen Koontz, Dean

Rice, Anne Straub, Peter

Read / Write Ratio

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Bringing it all together

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Search for product based on SKU

Search for products by product name

Search for products by category identifier

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Increment / decrement stock item

var product = prodCollection.findOne({_id: 4e1b091559a4f01109000000}); product.items_in_stock++; prodCollection.save(product);

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : 1}});

or the following to decrement the stock level:

db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : -1}});

Add / Edit products

"_id": "1", "name": "Electronics"

"_id": ObjectId("1_4e1b091559a4f01109000000"), "name": "Ipad", ....

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

be extrapolated from the product _id.

Create Shopping Cart

Define the correct shard key

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Split read and write data

Add / Remove products to / from shopping cart

Pay for cart by credit card

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Search for all categories

Search for products less than reorder threshold

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Running the mapReduce command will have the following output:

Search for sub-categories by category identifier

> map = function () {

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

Search total product value

Search cart total per date

> map = function () {