You are on page 1of 146

Documentation

Release 1.0.2

March 14, 2014

CONTENTS

Getting Started
1.1 Introduction to Cloudant . . . . . .
1.2 Prerequisites and Basics . . . . . .
1.3 Create Read Update Delete (CRUD)
1.4 Introduction to Querying . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

1
1
5
9
19

API Reference
2.1 API Basics . . . . . . . . . . . . .
2.2 Authentication Methods . . . . . .
2.3 Authorization Settings . . . . . . .
2.4 Databases . . . . . . . . . . . . . .
2.5 Documents . . . . . . . . . . . . .
2.6 Design Documents . . . . . . . . .
2.7 Miscellaneous . . . . . . . . . . .
2.8 Local (non-replicating) Documents

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

31
. 31
. 36
. 37
. 39
. 58
. 71
. 96
. 104

Using Cloudant With...


3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 .NET / Mono . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

107
107
111
116

Guides
4.1 The CAP Theorem . . . . . . . . . . . . . . . . .
4.2 MapReduce . . . . . . . . . . . . . . . . . . . . .
4.3 Document Versioning and MVCC . . . . . . . . .
4.4 CouchApps and Tiers of Application Architecture
4.5 Replication . . . . . . . . . . . . . . . . . . . . .
4.6 Back up your data . . . . . . . . . . . . . . . . .
4.7 How to monitor indexing and replication tasks . .
4.8 Data that Moves: Switching Clusters . . . . . . .
4.9 Transactions in Cloudant . . . . . . . . . . . . . .

119
119
120
124
127
127
133
136
139
139

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

ii

CHAPTER

ONE

GETTING STARTED

The Getting Started guide is a collection of newly written tutorials, links to currently available documenation for
Cloudant and Apache CouchDB, and links to relevant API references. In addition, we will use material from
the Cloudant Blog, stackoverflow.com and other external sites. It is organized in a way to help you learn to use
Cloudant from the ground up with no prior knowledge. These will lead you from the basics of HTTP and JSON
up through to our most advanced features, including examples and discussions on application design.

1.1 Introduction to Cloudant


What is Cloudant?
Cloudant is a hosted and managed distributed database-as-a-service, based on a horizontally scalable version of
Apache CouchDB and other tools. A Cloudant database is a schemaless JSON document store sharded across a set
of nodes (a la Dynamo), featuring incrementally-updated MapReduce Views, secondary key indexing, built-in full
text search (based on Lucene), geo-spatial indexing, multi-master replication (supports global data distribution)
and communicates over an HTTP RESTful API. Cloudant hosts and completely manages your database, including
24-hour support, at a fraction of the cost of a full-time administrator.
These Getting Started documents will guide you through the process of learning exactly what that means and how
to use it.

1.1.1 Conceptual Introduction


We break down the concepts of the opening paragraph above and briefly explain each idea, giving you a rough
overview of the technology.
Please note, when learning to use Cloudant and other document stores, it may be useful to avoid trying to map the
concepts of relational database management systems (RDMS) to the concepts of distributed document stores that
use MapReduce. These systems are significantly different and its easier, from our personal anecdotal experience,
to accept the document store ideas at face value without trying to find the identical concepts in relational systems.
For example, it may be easier to understand MapReduce Views if you dont try to figure out how it relates to
relational JOINs. MapReduce and JOINs are distinct and fundamental operations on the data in their respective
systems. They work in entirely different ways and they are used by developers in different ways. (Although you
can mimic a JOIN-like operation with MapReduce, which we will get to.) Once youve become familiar with
Cloudant, then its easier to compare the concepts to RDMS concepts and the impact on designs of applications
built with those database systems. So, clear your mind, accept the paradigm shift and let it all sink in.
Without futher ado...
Database Management System
The database management system is just that - a system that manages database-level operations related to the
distribution of data across the nodes in your cluster, the execution of MapReduce Views, load balancing of client
requests, data consistency and I/O operations. For each database in your Cloudant-hosted account, you will not
need to do anything related to the system on which those databases reside. Your interaction with your Cloudant

Documentation, Release 1.0.2

database will be restricted to making HTTP requests to add data, defining MapReduce Views, search index functions, and other special server-side functions that are tailored to your application, and then retrieving those results.
Document Store
For most novices, when they hear the word database a number of things come to mind. People are informed about
databases through academic course work and pop culture (movies, TV, country music, etc.). In addition, databases
had been relatively static for the past 30 years or so (before Googles MapReduce and Amazons Dynamo paradigm
shifts). So, when novices hear the word database they often think of tables, which are the typical representations
of relations in a relational database system. However, this is not at all what a Cloudant database looks like.
A Cloudant database is a collection of JSON documents. In a sense, a JSON document store is almost completely
orthogonal to a relational table. A JSON document is a set of key-value pairs, whereas a relation is a set of tuples.
Key-Value

A key-value pair is simply the name of something (key) and its value. In a JSON document it looks something
like this
{
"name":"Adam"
}

JSON documents

Imagine that you have a relation with a schema that has three columns name, age, and gender. Each entry in
the relation would be a tuple that looks something like (Adam, 26, M) or (Sue, 32, F). A JSON document
would contain both the column name, as the key, and the value. These tuples would be stored in a JSON document
like this.

A value can be a string, number, an array or another object (set of key-value pairs). Heres an example JSON
document
{
"name": "adam",
"age": 26,
"car": {
"make": "ford",
"model": "mustang",
"year": 1965
},
"numbers": [
7,
21,
"goats"
],
"colors": [
"blue",
"green",
{"cheese":"caprifeuille"}
],
"friends": [
{
"name": "sean",

Chapter 1. Getting Started

Documentation, Release 1.0.2

"phone": 5551212
},
{
"name": "kara",
"email": "kara@internet.com"
}
]
}

A document store provides a number of advantages over the relational model. JSON documents support a nested,
richer data format, that allows for flexibility and map better to many of todays applications. You can find more
information about JSON in our documentation, in Apache CouchDB docs, and at http://json.org/, which includes
libraries and tools for various languages for handling JSON.
JSON documents in Cloudant

For each JSON document in a Cloudant database, there are two special key-value pairs that are required. These
are the "_id" and the "_rev". The value of the "_id" key is how the database system identifies each document and must be unique in each database. The "_rev" is used to implement multi-version concurrency control
(MVCC). Each time you upload a document to the database for a specific "_id", the "_rev" value incremements up. This is to detect conflicts if a particular document is simultaneously updated by multiple clients on
different nodes in the cluster. MVCC is not a version control system dont even try.
A document in Cloudant will look something like this
{
"_id":"bbc9e6125aca5cffb1cf65aefeb105ec",
"_rev":"1-4c6114c65e295552ab1019e2b046b10e",
"name":"Adam",
"age":26,
"gender":"M"
}

Additionally, there other special keys for each document: _attachments, _conflicts and _deleted,
which will be discussed in detail later. In general, however, you may not give any top-level key a name that begins
with an underscore, such as _foo:bar. Attempts to do so will be returned an error.
Schemaless
Schemaless means that the database management system does not enforce any schema on your data and it does
not depend on any schema, other than the special keys that start with an underscore. Each document in a database
can have a completely unique schema or they can all have the same schema. Its up to your design. Additionally,
the schema of your documents can be changed at any time and for any subset of your documents as needed.
For example, I can later change the document above to
{
"_id":"bbc9e6125aca5cffb1cf65aefeb105ec",
"_rev":"2-a561c4119b2c2f34aa2219ff3504710c",
"phone":"12062959632",
"name":"Adam",
"age":26,
"height":185,
"mass":77.1
}

without having to notify the system of the change or modify any other documents.
Schemaless does not refer to the schema of your applications data. Whenever you build an application you
should probably create (and enforce, if you wish) your own schema for the data you wish to store.

1.1. Introduction to Cloudant

Documentation, Release 1.0.2

HTTP RESTful API


An HTTP RESTful API means that you will make HTTP requests to a logically organized set of URI endpoints
that allow you to insert data, read data, make queries and monitor the database. An HTTP REST API makes
using Cloudant completely agnostic to the programming language or tool that you are using to interact with the
database. Any tool or language that speaks HTTP can be used to manage data on Cloudant.
Horizontally Scalable
Horizontal scaling means that your data are spread out among multiple shards on multiple nodes in a cluster and
that the performance of the cluster improves with more shards and nodes. Cloudant clusters are not organized in
a master-slave architecture. Each node in the cluster is equivalent to the others and can be used for both read and
write operations. Also, because multiple copies of your data are stored on multiple nodes, your data is safe should
any particular node fail. As a user of Cloudant, you do not have to worry about sharding your data across the
cluster, load balancing and managing each node individually. This is done for you as part of the service.
Primary Index
In order to discuss secondary indexes, we first mention the primary index. An index is a database file that holds
your data and is designed to improve the speed of data retrieval. In particular for Cloudant, the primary index
is generated based on each documents unique "_id". The index is automatically generated for you when you
upload data to your database. You can retrieve all of your documents via the primary index by making an HTTP
GET request to the _all_docs endpoint. This returns an alphanumerically sorted list of each documents "_id".
Optional paramters allow you to do things like reverse the order of the output and splice out particular subsets.
Incremental MapReduce and Secondary Indexes
MapReduce, first introduced by Google, is a data analysis paradigm that allows you to perform arbitrary calculations on data that are distributed among multiple nodes and then aggregate those results for requesting client
applications. It is a powerful way to perform analytical calculations on your data when your data cannot be contained to a single machine. In Cloudant, this is called a MapReduce View since it is a persistent, pre-built
calculation made with your data.
A MapReduce View, as implied by the name, is a two-step procedure. The Map function emits a new set of keyvalue pairs based upon the contents of each document. The Reduce function then makes subsequent calculations
using the set of values associated with the range of keys specified in the query. Typically, the reduce function will
count, sum or perform a statistical calculation based on numerical values. You can create as many MapReduce
Views as needed for any particular database for your application (barring that you dont exceed the total amount
of disk space available on a node).
MapReduce Views store the emitted key-value pairs and reduce calculations in a secondary index. You can then
query that secondary index to retrieve a set of key-value pairs (ordered by the key) or reduce results. MapReduce
Views and Search indexes are the primary ways to analyze and retrieve your data.
One way for a beginner to wrap their heads around a JSON document store with MapReduce is to think of the
database first as a store for JSON documents (potentially with different sets of schemas). You can then use various
MapReduce Views to sort your documents based on their content rather than their "_id". The reduce performs
a calculation on of a set of key-values emitted by the map to return a single result. This will become clear in the
MapReduce tutorial.
In Cloudant, the MapReduce Views results are incrementally updated when documents are updated or added to
the database. Systems like Hadoop will perform a MapReduce job on your entire data set and return the results
to you. But if you update or add new documents and ask for the MapReduce results again, Hadoop will rerun the
job again over your entire data set. Additionally, in Cloudant, the MapReduce View results are calculated for you
automatically (unlike vanilla Apache CouchDB). Thus, it is extremely fast to retrieve the pre-defined MapReduce
results in Cloudant.

Chapter 1. Getting Started

Documentation, Release 1.0.2

Cloudant Search
With Cloudant Search, built on Lucene, you can index the values of any set of keys found in your documents and
then query that index for exact word matches, numerical matches or fuzzy matches. Cloudant Search allows you to
do more ad-hoc queries over your data than can be done with the primary and secondary indexes. We recommend
using Cloudant Search if your particular use case is to find documents based on multidimensional queries.
Geo-Spatial
Geo-Spatial queries allow you to query your database documents based upon a restricted geographical area. By
adding a GeoJSON object to your documents you may querying the database to retrieve documents that fall within
an arbitrarily shaped geographic region. This could be used for querying data associated with physical locations
on Earth, or you could use this to create heat-maps for events in a 3D game world. This feature is currently in
beta, however. Please contact us if youre interested in joining our early-adoption program.
Replication
Replication is very similar to replaying the activites on one database onto another database. One can replicate
any Cloudant database to another Cloudant database or to other Apache CouchDB-like databases that support
the replication protocol, including mobile platform libraries. Replication can be unidrectional or bidirectional to
create a multi-master system.
Global Data Distribution
Cloudant has partnerships with a number of providers that own data centers around the world: Rackspace, Amazon, IBM/Softlayer, and Microsoft Azure. Cloudant can put your data in any location these companies have data
centers, which lets you put your data closer to your users. Additionally, your data can be distributed to multiple
data centers throughout the world as needed via replication.
Other Features
In addition to the features mentioned above, Cloudants database management system provides
arbitrary data attachment to individual JSON documents
document validation functions to enforce schema, user requirements and some types of business logic.
list and show functions to manipulate documents and secondary index queries server-side before returned
to the client
chained MapReduce to send results to another database to run other MapReduce jobs.
_changes feed to observe all changes made to a database
_db_updates feed to observe updates to all of your accounts databases (early-adoption beta only)

1.2 Prerequisites and Basics


The Basics section presents the nuts and bolts of using Cloudant. Well point you toward the available tools and
give you a basic overview of how to interact with the database via the HTTP RESTful API.

1.2. Prerequisites and Basics

Documentation, Release 1.0.2

1.2.1 Installing Cloudant


Installing Cloudant is simple. You dont.
Sign up for an account and your data lives securely and safely on our servers hosted by our various partners. You
can even move your data around to any of the datacenters where we have a presence.
When you initially start with Cloudant, youll probably be running on one of our free multi-tenant three-node
clusters. If you want to move your account from our Chicago data center to San Francisco or to Amsterdam, well
move it for you. As your needs start to scale, you can graduate to your own private three-node dedicated cluster .
We build the cluster for you and move your data. Later, as you need to scale and add nodes to your system, we do
that for you too.
Throughout this entire process, from the initial beginnings on a multi-tentant three-node cluster up to your own
400-node dedicated cluster on bare metal, interaction with your Cloudant database remains exactly the same
HTTP calls. We manage all system-related tasks for you, and even help with your data design, which lets you
focus on building your application.

1.2.2 Tools in our Examples


The examples in the Getting Started section, so far, use cURL and Python (2.7). Python examples tend to be
quite readable even by those not familiar with the language. See the Using Cloudant With... section for more
information and examples with other languages.
Python Setup
When running Python examples, its good to keep a clean environment so that you can isolate yourself
from the rest of your Python projects. To do this, we recommend the virtualenv environment manager
(https://pypi.python.org/pypi/virtualenv).
mkdir /path/to/learn_cloudant
pip install virtualenv
virtualenv venv
source venv/bin/activate

Once the virtual environment is activated, you may then install subsequent libraries in isolation from the rest of
your system. When you return later to continue learning, from the learn_cloudant directory youll need to source
venv/bin/activate to access any libraries installed specifically for this environment.
Alternatively, one could use the virtualenvwrapper.
The Python library we use throughout the Getting Started guides is requests.
requests: http://docs.python-requests.org/en/latest/index.html
pip install requests

cURL
cURL is a shell-based communication tool that supports various networking transfer protocols, including HTTP.
It is usually installed by default on Mac OS X and most Linux distributions or may be installed through a package
manager.
http://curl.haxx.se
./jq
We use jq to print pretty, slice and transform the JSON returned by cURL HTTP calls to the database.
http://stedolan.github.io/jq/

Chapter 1. Getting Started

Documentation, Release 1.0.2

1.2.3 Making Calls to the Cloudant Database with HTTP


This section is aimed at the novice who is new to making HTTP calls in order to interact with an API. For a more
technical description, see our API Basics section.
In general, the HTTP specification defines a protocol for sending a request to a server and a protocol for the results
returned by the server. HTTP requests are structured similar to a sentence; they are composed of verbs and nouns
(direct and indirect objects). Interaction with a Cloudant database may be done via the HTTPs verbs GET, PUT,
POST, DELETE, COPY and HEAD. The direct and indirect objects of the HTTP request are the API methods and
any data that may you include.
The first task is to figure out how to make those HTTP requests with your programming tool of choice and handle
the response. Afterwards, you then learn the different methods available in the Cloudant API. We will guide you
through this process.
The HTTP API methods are often called endpoints. You can think of them like methods or functions in a
programming language. Some endpoints are called with data passed in as an argument, and some endpoints
support an optional array of parameters. Some endpoints take no parameters whatsoever. All endpoints return
some type of HTTP response that tells you the success or failure of the method and will return any data that you
requested.
An HTTP request that you make with Cloudant typically looks like
HTTP VERB <headers> <data> https://username.cloudant.com/<API endpoint>?<parameters>

where the headers, data, and parameters are optional, depending on the HTTP verb and API endpoint. In an HTTP
request, the parameters are typically formed as param=value and are separated by an ampersand (&). The
header will almost always be Content-Type: application/json and the data will almost always be in
JSON format. One exception to this is when you upload attachments to a document in your database. In that case,
your header should match the MIME type of the data.
Some HTTP request examples are shown below using cURL and Python. Additionally, the link to the API reference page for each of these commands is provided, which will help the novice to understand how to read the
API.
Example HTTP Requests
GET Your Cluster Information

API Reference: GET /


HTTP GET https//username.cloudant.com/
curl -X GET -u username https://username.cloudant.com
import requests
response = requests.get(
"http://username.cloudant.com/",
auth=(username, password)
)
print response.json()

GET All Databases for a user

API Reference: GET /_all_dbs


HTTP GET https//username.cloudant.com/_all_dbs
curl -X GET -u username https://username.cloudant.com/_all_dbs

1.2. Prerequisites and Basics

Documentation, Release 1.0.2

import requests
response = requests.get(
"http://username.cloudant.com/_all_dbs",
auth=(username, password)
)
print response.json()

Create a New Database with PUT

API Reference: PUT /db


Change db to the name of your new database.
HTTP PUT https//username.cloudant.com/db
curl -X PUT -u username https://username.cloudant.com/db
import requests
response = requests.put(
"http://username.cloudant.com/db",
auth=(username, password)
)
print response.json()

Create a New Document with POST

API Reference: POST /db


Change db to the name of your new database.

HTTP POST Header=Content-Type: application/json data={"foo":"bar"} https//username.cloudant.com/

curl -X POST -H Content-Type: application/json -d {"foo":"bar"} -u username https://username.c


import requests
import json
doc = {foo:bar}
response = requests.post(
"http://username.cloudant.com/db",
data=json.dumps(doc),
auth=(username, password),
headers = {Content-type: application/json}
)
print response.json()

DELETE a Document

API Reference: DELETE /db/doc


Change db to the name of your new database.
HTTP DELETE https://username.cloudant.com/db/<doc._id>?rev=<doc._rev>
curl -X DELETE -u username https://username.cloudant.com/db/<doc._id>?rev=<doc._rev>
import requests
response = requests.delete(
"http://username.cloudant.com/db/<doc._id>",
auth=(username, password),

Chapter 1. Getting Started

Documentation, Release 1.0.2

params={"rev":<doc._rev>}
)
print response.json()

Weve left out the specific values here because often times the _id looks something like
3a4d992b78c7a7361b0a50ef963c1a1e and the _rev like 1-4c6114c65e295552ab1019e2b046b10e,
which would make that slightly less readable.
Also note that the python requests library will form the proper URI for you with the params object.

1.2.4 External Libraries


First, we suggest that you look through the CRUD examples in the Hngematte. These examples do not use any
Cloudant or Apache CouchDB-specific external libraries. They use just the basic libraries (or modules) to make
HTTP requests and manipulate JSON, which are typically easy to use.
Without discussion or comments on each library, Here is our compiled list of developed libraries for various
languages.
A word of warning about using external libraries: We do not officially support any of these and they may either
become unmaintained or out of sync with the Cloudant API. Using HTTP and JSON libraries will allow you to
always use the entire Cloudant API.
Having written that, however, external libraries can be extremely useful because they can save you tons of time
and hide the tedious bits of making HTTP requests. Additionally, you can choose to do both! You can use an
external library and also use the direct HTTP requests when necessary.
Also, we have examples for using Cloudant with specific libraries in a few popular languages. Be sure check these
out and come back from time-to-time as we plan to add more.
This Getting Started guide, however, will focus on the HTTP API without using a fancy-pants library in order
to be as instructional and general as possible. With that understanding, its then much easier to know how to use
Cloudant in conjunction with different libraries.

1.3 Create Read Update Delete (CRUD)


CRUD is our starting point for learning how to use Cloudant the usual the starting point for learning any
persistent data store.
Here are some relevant external documents you may want to read along with or after reading through the material
presented here. However, reading just this document should be completely sufficient for understanding CRUD
operations.
Cloudant For Developers page on CRUD
Cloudant API Reference on Document methods
Apache CouchDB Core API Introduction
Apache CouchDB API Reference on Document methods
Die Hngematte: CRUD examples in many languages

1.3.1 Single Documents


First, youll need a database for your documents. In these examples well create a users database to hold the
information for each user of our application.
API Reference: PUT /db

1.3. Create Read Update Delete (CRUD)

Documentation, Release 1.0.2

curl -X PUT -u username https://username.cloudant.com/users

With Python, youd do the following


import requests
import json
#basic authentication
auth = (username, password)
post_url = "http://{0}.cloudant.com/users".format(auth[0])
#r is a Response instance
r = requests.put(post_url, auth=auth)
print json.dumps(r.json(), indent=1)

Both methods should return {"ok":true}.


To visually verify the success of this, you can sign in to your account on cloudant.com and see your database in
the dashboard.
Create
Create a new document on your new database.
API Reference: POST /db and PUT /db/doc
For simplification, well let the database generate the "_id" value. If you want to set the "_id", just include it
in the document that you write to the database.

curl -X POST -H Content-Type: application/json -d {"first_name":"bob", "high_score": 550, "leve


import requests
import json
#the database JSON document maps to a Python dictionary
doc = {
#_id:bob456, #uncomment this line if you want to specify the _id
first_name:bob,
high_score:550,
level:4
}
auth = (username, password)
headers = {Content-type: application/json}
post_url = "http://{0}.cloudant.com/users".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers = headers,
data=json.dumps(doc)
)
print json.dumps(r.json(), indent=1)

The Response object returned by the requests module contains the HTTP response code (r.status_code) and
data (r.text), which can be decoded as JSON via r.json().
Additionally, one can use the PUT /db/doc to create a new document. The difference is that you must specify the
"_id" of the document in the URI (gotcha: when using PUT /db/doc, if the "_id" key is specified in the
document, it will be ignored and the "_id" in the URI will be used).
10

Chapter 1. Getting Started

Documentation, Release 1.0.2

curl -X PUT -H Content-Type: application/json -d {"first_name":"bob", "high_score": 850, "level


import requests
import json
#the database JSON document maps to a Python dictionary
doc = {
first_name:bob,
high_score:850,
level:6
}
auth = (username, password)
headers = {Content-type: application/json}
post_url = "http://{0}.cloudant.com/users/bob456".format(auth[0])
r = requests.put(
post_url,
auth=auth,
headers=headers,
data=json.dumps(doc)
)
print json.dumps(r.json(), indent=1)

Read
Documents can be retrieved by knowing the value of the "_id" and constructing an HTTP GET request to the
proper URI.
API Reference: GET /db/doc
In
the
example
above,
the
value
of
"_id"
assigned
to
the
document
was
0562df6ffcc4301b1e2c5d1061214489. Yours, of course, will be different. The URI for this document is
https://username.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489
If you used the "bob456" for the document "_id", then it would be
https://username.cloudant.com/users/bob456
Here is an example curl command to retrieve this document
curl -X GET -u username https://username.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489 | jq

and equivalent code in Python


import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489".format(auth[0])
r = requests.get(get_url, auth=auth)
print json.dumps(r.json(), indent=1)

These examples should print the same document to screen. You will now notice that your document contains the
new keys, "_id" and "_rev".

1.3. Create Read Update Delete (CRUD)

11

Documentation, Release 1.0.2

You will often read from the database things other than an entire single document. For example, youll very often
want the results of one of your MapReduce Views. As mentioned in the section on making HTTP requests, you
will still make an HTTP GET request but with a different API endpoint that corresponds to your MapReduce View
result. But the code required to make the HTTP GET, of course, remains exactly the same. Using MapReduce
will be further explained in the Querying section.
Update
Update an existing document on the database.
API Reference: POST /db and PUT /db/doc
At Cloudant, we encourage you to build systems based on immutable documents. That is, instead of updating a
particular document, you POST to the database new documents. Then, you use an incrementally built MapReduce
View to reconstruct the state of your application. See this guide on the subject.
However, sometimes your application design really does need to make updates to existing documents. And that is
perfectly okay.
Making updates to a document is the same as creating that document on the database except that you will need
to include the "_id" and "_rev". In fact, due to the schemaless nature of the Cloudant database, those are the
only key-value pairs that you need. The "_rev" value will have to match the latest value in the document found
on the database. A mismatched "_rev" will result in an error.
For example
# update_user.py
import requests
import json
auth = (username, password)
headers = {Content-type: application/json}
get_url = "http://{0}.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489".format(auth[0])
r = requests.get(get_url, auth=auth)
#update the document
doc = r.json()
doc[high_score] = 623
doc[level] = 5

#note the doc already has the _rev value, which, in this case, should match the value on the datab
print doc
#with the doc updated with a new high_score and level, we POST it back to the database
post_url = "http://{0}.cloudant.com/users".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(doc)
)
print json.dumps(r.json(), indent=1)

You can also update documents with PUT /db/doc. Again, the URI should end with the "_id", and the "_rev"
in the document must match the latest value on the server.
Another way to update documents is to write a specific update handler function in a _design document. This
lets you update only specific keys in a document in your database via a POST request to the URI for that update
handler (a kind of in-place update). This eliminates the need to upload the entire document to change a single
12

Chapter 1. Getting Started

Documentation, Release 1.0.2

key. Using an update function can even be used to create new documents or modify incoming new documents.
For example, you could use an update handler to add a time-stamp on the server-side. We plan to cover this in
the future, but for now we refer you to the CouchDB documentation on update handlers. However, with update
handlers you can only update one document at a time. Well show you below how to create/update multiple
documents in a single HTTP request.
Delete
Delete a single document on the database.
API Reference: DELETE /db/doc
There are two ways to delete a document.
You can make an HTTP DELETE request to the
/db/doc?rev=doc_rev endpoint, or you can add a "_deleted":True key-value pair to the document
and POST that document to the database. Either method can be used for a single document, but the second
method is the only way to delete multiple documents in a single HTTP request. The single document delete call
is shown here and the multiple document delete request is shown below.
To please the gods of verbosity, here is an example curl command

curl -X DELETE -u username https://username.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489?re

and equivalent code in Python


import requests
import json
auth = (username, password)
url = "https://{0}.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489".format(auth[0])
payload = {"rev": "1-fe13604028dde04ab0444909e5ff38b3"}
r = requests.delete(url, auth=auth, params=payload)
print json.dumps(r.json(), indent=1)

PUT vs POST when Creating a New Document


Its easy to forget when to use PUT and when to use POST when trying to create a document on the database.
Youll notice above that they can both be used. Here are the differences.
Use PUT when the URI to which you are putting data contains the "_id" value.
Use POST when the URI does not contain the "_id" value. However, you can still specify an "_id"
in the document when you POST to the database. If you dont include an "_id" value, the database will
generate one for you.
Based on this distinction, its obvious that its necessary to use POST when inserting multiple documents in a single
HTTP call. This is because with many documents its not possible to specificy a particular "_id" in the URI as
is needed with a PUT.

1.3.2 Multiple Documents


This section will show you how to perform CRUD operations for more than one document at a time. Bulk
operations can improve performance since reading and writing multiple documents in a single HTTP call may be
faster than doing those call individually.
Create
Create multiple documents in a single HTTP request.

1.3. Create Read Update Delete (CRUD)

13

Documentation, Release 1.0.2

API Reference: POST /db/_bulk_docs


In this example, well explicitly set the document "_id" (even though its not necessary).

curl -X POST -H Content-Type: application/json -d {"docs":[{"first_name":"bob", "high_score": 5


import requests
import json
doc_bob = {
_id: bob123,
first_name:bob,
high_score:550,
level:4
}
doc_alice = {
_id:alice678,
first_name:alice,
high_score:553,
level:4
}
bulkdocs = {"docs":[doc_bob, doc_alice]}
auth = (username, password)
headers = {Content-type: application/json}
post_url = "http://{0}.cloudant.com/users/_bulk_docs".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(bulkdocs)
)
print json.dumps(r.json(), indent=1)

To faciliate the subsequent bulk operations in the examples below, the following python script will generate a
number of new documents in your users database. We intend to use these examples in subsequent examples.
import requests
import json
import random
usernames = (alice, bob, cartman, mario, zelda, sawyer, daniel, sabine, luigi)
bulkdocs = {"docs":[]}
for aname in usernames
bulkdocs["docs"].append(
{
"_id":aname,
"first_name":aname,
"high_score":500 + 100*random.random(),
"level": 4 + int(3*random.random())
}
)
auth = (username, password)
headers = {Content-type: application/json}
post_url = "http://{0}.cloudant.com/users/_bulk_docs".format(auth[0])
r = requests.post(

14

Chapter 1. Getting Started

Documentation, Release 1.0.2

post_url,
auth=auth,
headers=headers,
data=json.dumps(bulkdocs)
)
print json.dumps(r.json(), indent=1)

Read
Read multiple documents in a single HTTP request.
API Reference: GET /db/_all_docs
To read multiple documents, one queries either the primary index via /db/_all_docs or a secondary index
created by a MapReduce View. Since we havent introduced MapReduce Views and secondary indexes yet, well
just show how to use the /db/_all_docs endpoint along with a few options.
A first example is GET /db/_all_docs?limit=10. We use the limit=10 option to keep the output to just
ten keys (in case you decided to modify the script above and added a zillion docs).
curl -u username https://username.cloudant.com/users/_all_docs?limit=10 | jq .

and in Python
import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/_all_docs?limit=10".format(auth[0])
r = requests.get(get_url, auth=auth)
print json.dumps(r.json(), indent=1)

The return from this API endpoint returns the following JSON structure
{
"total_rows": N,
"offset": M,
"rows": [
{"key": <doc._id>, "value": {"rev":<doc._rev>}, "id":<doc._id>},
...
]
}

where N and M are integers. The total_rows are the total number of rows in the index (or MapReduce View
result) and the offset tells you which row the rows array starts on relative to all rows in the index. Each element
of "rows" is an object that contains at least three key-value pairs with keys of "key", "value", and "id".
The values of the "key" and "value" depend on the index that was queried. In the case of /db/_all_docs,
the value of "key" is always the value of the document "_id", and the value of "value" is always a JSON
object with a "rev":<doc._rev> pair. The value of "id" is the value of "_id" of the document from which
the "key" and "value" were derived. Yes, the "_id" of each document is found twice in each row - but this
is particuar to the /db/_all_docs endpoint. Yes, this can seem confusing! Take a look at the data returned by
your requests to make sure this is clear to you.
There are a number of parameters that can be used with the /db/_all_docs endpoint. However, well just
cover two of them here and save the rest for later.

1.3. Create Read Update Delete (CRUD)

15

Documentation, Release 1.0.2

The include_docs option

We can ask to get all of the documents in the query response with the parameter include_docs=true.
curl -u username https://username.cloudant.com/users/_all_docs?limit=10&include_docs=true | jq .

and in Python (note that I can use the params argument to requests.get)
import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
params = {"include_docs":"true", "limit":10} #make sure to use "true" not True
r = requests.get(get_url, auth=auth, params=params)
print json.dumps(r.json(), indent=1)

With include_docs=true, each element of "rows" will have fourth key-value pair; the key is "doc" and
its value will be the entire JSON document.
{
"total_rows": N,
"offset": M,
"rows": [
{"key": <doc._id>, "value": {"rev":<doc._rev>}, "id":<doc._id>, "doc":<doc>},
...
]
}

The startkey and endkey options

With the startkey and endkey options we can splice out a subset of the results. That is, the "rows" array
will only contains the rows where the values of "key" are in the inclusive range of [startkey, endkey].
Unless, of course, you use the inclusive_end=false option. In this example, we get all documents with
an _id that begins with the letter, s or S. Notice that this cURL command requires single quotes around the
URL because of the double quotes around the values of startkey and endkey.

curl -u username https://username.cloudant.com/users/_all_docs?limit=10&include_docs=true&startke

and in Python
import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
params = {
"include_docs":"true", #this is a boolean in the URI string
"limit":10,
"startkey":"\"s\"", #you need to escape the quotes
"endkey":"\"t\"",
"inclusive_end"="false"
}
r = requests.get(get_url, auth=auth, params=params)
print json.dumps(r.json(), indent=1)

16

Chapter 1. Getting Started

Documentation, Release 1.0.2

Feel free to play around with the starkey and endkey values to get a feel for this feature. These will be useful
for future queries to MapReduce View results youll find that many of the options for /db/_all_docs are
the same for queries to secondary indexes.
Update
Update multiple documents in a single HTTP reuqest.
API Reference: POST /db/_bulk_docs
As previously mentioned, updating documents already on the database is nearly the same as creating new documents, except that you need to specifiy the "_id" and current "_rev" key-value pairs for each document. The
following example grabs a set of documents from the primary index, makes some changes to the documents, then
uses /db/_bulk_docs to insert them into the database.
import requests
import json
auth = (username, password)
headers = {Content-type: application/json}
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
params = {
"include_docs":"true",
"limit":20
}
r = requests.get(get_url, auth=auth, params=params)
bulkdocs = {"docs":[]}
for row in r.json()["rows"]:
doc = row["doc"]
doc["high_score"] = doc.get("high_score", 0) + 100*random.random()
doc["level"] = doc.get("level", 0) + int(2*random.random())
bulkdocs["docs"].append(doc)
post_url = "http://{0}.cloudant.com/users/_bulk_docs".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(bulkdocs)
)
print json.dumps(r.json(), indent=1)

Delete
Delete multiple documents with a single HTTP request.
API Reference: POST /db/_bulk_docs
Are you surprised by the API Reference? In order to delete mutiple documents with a single HTTP call, youll need
to add a "_deleted":True key-value pair to each document and then update those documents to the database
in a single HTTP POST. Youll notice in the example script below that each document is not downloaded in full.
Instead, we use the "_id" and current "_rev" to essentially build a new document with a "_deleted":True
import requests
import json

1.3. Create Read Update Delete (CRUD)

17

Documentation, Release 1.0.2

auth = (username, password)


headers = {Content-type: application/json}
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
r = requests.get(get_url, auth=auth)
bulkdocs = {"docs":[]}
for row in r.json()["rows"]:
doc = {
"_id":row["id"],
"_rev":row["value"]["rev"],
"_deleted":True
}
bulkdocs["docs"].append(doc)
post_url = "http://{0}.cloudant.com/users/_bulk_docs".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(bulkdocs)
)
print json.dumps(r.json(), indent=1)

1.3.3 Design Choice regarding "_id"


Even though at this stage much about using the database has not been presented, there is already a design pattern
that can be discussed. It is in regard to the "_id" key. You have a choice either to use your own unique identifier,
or to let the database management system assign one for you.
If you have a sure-fire way of creating a unique "_id" for each document and an alphanumeric sorting of those
"_id" can be used in some way, then you should probably make this choice since the primary index is built for
you automatically without any extra code. Additionally, requests to the primary index (/db/_all_docs) often
complete faster than requests to secondary or search indexes.
In some cases it can be advantageous to record the document type in the "_id" To do this, we recommend to
prepend the "_id" with a few letters to indicate the type. This lets you use the /db/_all_docs endpoint to
select documents based on type without having to write a MapReduce View.
For example, lets say we want to have documents that hold information for users and documents that hold
information for content. Suppose that we can generate unique "_id" values for each of those subsets and we
want to store both sets of documents in the same database. We would probably create "_id" that look like:
"_id" : "user:12345"

and
"_id" : "content:12345"

The results from the request to /db/_all_docs is then already sorted by type. The request
GET /db/_all_docs?startkey="user:"&endkey="user:\ufff0"&include_docs=true

will return all the user documents. (The \ufff0 is a very large special unicode character useful for setting a
range. See the String Ranges section here.)

18

Chapter 1. Getting Started

Documentation, Release 1.0.2

1.4 Introduction to Querying


In Cloudant, querying is making an HTTP request, usually with some set of parameters, to extract data from an
index. On a given database, you will have a primary index, and very likely have a search index and a secondary
index. The primary index, as previously mentioned, is constructed for you automatically and you can query
it immediately. The search index and secondary indexes are constructed for you based on search index and
MapReduce View functions that you must first create. So, querying these indexes is a two step process you first
must create the function that defines the index to be queried. Both, the search and MapReduce View functions are
defined in special documents stored in your database, called _design documents. You will specifically tailor
them for your application and we will show you examples.
This section leads you through the documentation for making queries to your primary key, building and querying
MapReduce View function, search index function, and the tools to help you write those functions to a _design
document. When you are finished, you will be familiar with the majority of the nuts and bolts of using Cloudant.

1.4.1 Primary Index


The primary index, accessed via the GET /db/_all_docs, will return an alphanumerically sorted list of the set of
document "_id" found in the database named db.
Here are the documents to read to understand how to query the primary index.
Basic use of /db/_all_docs was covered in the CRUD section on reading multiple documents.
A more interactive demo of /db/_all_docs is available in our Cloudant For Developers section.
Read through and be familiar with the optional parameters described in the Cloudant API Reference.
Notice the POST /db/_all_docs endpoint as well, which lets your retrieve a specific set of "_id".
Common Pitfalls
There are two specific points to stress.
Inclusive and Exclusive Range Query

Notice that the startkey and endkey specify a range of keys that are inclusive. That is, if startkey="a"
and endkey="b" is specified, a document with "_id":"a" and "_id":"b" will be included in the results.
You can make this request exclusive, however, if you use the inclusive_end=false option to exclude an
"_id" equal to the endkey value and the skip=1 to exclude an "_id" equal to the startkey value. (Note,
this skip=1 trick will not necessarily work with secondary indexes since those keys are not required to be
unique.)
Reversing the Order of Results

A mistake is often made by new developers when using both the descending=true option and one or both
of the startkey="<value>", endkey="<value>" options. The results found in the "rows" array are
alphanumerically ordered by the value of the "key". In the case of /db/_all_docs, they are ordered alphanumerically by the document "_id". When you use descending=true this reverses the order. However,
the startkey and endkey parameters are applied after the descending option is applied. So, when using
descending=true you must reverse the values of startkey and endkey in order to get the same results
with descending=false.
For example, lets say you would like all of the documents with values of their "_id" between a123 and a456,
inclusively. You could make the request
GET /db/_all_docs?startkey="a123"&endkey="a456"

1.4. Introduction to Querying

19

Documentation, Release 1.0.2

You can get the same results but returened in a different order by using the descending=true option. The
request should be
GET /db/_all_docs?endkey="a123"&startkey="a456"&descending=true

This reversed order mistake can also be made when querying the secondary index since the startkey, endkey
and descending options work in the same way.
Design Choice regarding "_id"
For time-series data, one trick is to insert the timestamp of the data into the document "_id" assuming that
you can guarantee that the "_id" will be unique. This obviously lets you sort your documents by time with the
primary index. Also, using a combination of the descending, limit=1, and startkey=<T> options, you
can obtain the document before or after some particular time, T.
For example, lets suppose we are inserting documents to the database with an "_id" that looks like timedata:ISO format. Here are a few examples
"_id":"timedata:2010-09-30T01:33+00:00",
"_id":"timedata:2010-10-05T14:18+00:00",
"_id":"timedata:2011-01-16T14:26+00:00",
"_id":"timedata:2011-04-04T08:20+00:00",
"_id":"timedata:2011-05-15T21:57+00:00",
"_id":"timedata:2011-07-21T08:49+00:00",

A query to retrieve the last document inserted to the database before a particular date (lets say April 1, 2011),
would be

GET /db/_all_docs?descending=true&limit=1&include_docs=true&startkey="timedata:2011-04-01T00:00+00

This would return the document with "_id":"timedata:2011-01-16T14:26+00:00".


If the document just after April 1, 2011 was desired, one would set descending=false (or just remove it
since false is the default).

1.4.2 MapReduce Views and Secondary Index


One way for a beginner to wrap their heads around a JSON document store with MapReduce is to think of the
database first as a store for JSON documents (potentially with different sets of schemas), which is sortedy by
"_id" in the primay index. You can then use MapReduce Views to sort your documents based on their content
rather than their "_id". Then the reduce can perform a subsequent calculation on of a set of key-values emitted
by the map to return a single result. The results of the MapReduce View are stored in secondary indexes that you
can query.
Here are some documentation on Cloudant MapReduce Views that we recommend for you to read:
Cloudant For-Developers MapReduce View Guide
Cloudant MapReduce Guide
Cloudant Querying a View
Apache CouchDB Intro to MapReduce Views
In addition, you may also find this article about building a voting application with Twilio, node.js and Cloudant to
be instructional, which includes a nice discussion on their data model design and MapReduce Views.
A Tutorial
While the Cloudant and Apache CouchDB documents linked to above provide examples, well provide another full
example here using data similar to those presented in the CRUD section. For instructional purposes, well build
the MapReduce functions using the browser (and Futon interface) before we show you how to create _design

20

Chapter 1. Getting Started

Documentation, Release 1.0.2

documents programmatically. However, in production, you will really want to use a _design document management tool that lets you write your MapReduce, Search and other functions on a local machine and upload them
to the database for you. Additionally, this will let you use a version control system to manage your _design
document code.
1. Simulate Game Results

Were going to create a new database (called gameresults) and store documents in that database that simulate
recording the results of each game played by our users. In this case, well only simulate having 11 unique users,
to keep it simple. Each user plays the game a random number of times with a randomly generated score. The
following Python code will generate this simulation (hopefully its not too difficult for you to translate this into the
language that youre using to learn Cloudant).
import
import
import
import

requests
json
random
time

username = username
gameresultsdb = gameresults
auth = (username, password)
headers = {Content-type: application/json}
def randomDate(start, end):
prop = random.random()
format = %Y-%m-%dT%H:%M+00:00
stime = time.mktime(time.strptime(start, format))
etime = time.mktime(time.strptime(end, format))
ptime = stime + prop * (etime - stime)
return time.strftime(format, time.localtime(ptime))
def generateDoc():

usernames = (alice, bob, cartman, daniel, sawyer, zelda, sabine, luigi, mario,
dateplayed = randomDate(2010-1-1T00:00+00:00, 2013-11-01T00:00+00:00)
randomlevel = int(10*random.random() + 1)
randomscore = int(sum([100*random.random() for i in range(randomlevel)]))
return {
"playername":random.choice(usernames),
"score":randomscore,
"level":randomlevel,
"date_played":dateplayed
}
def bulk_insert(ddocs):
post_url = "https://{0}.cloudant.com/{1}/_bulk_docs".format(username, gameresultsdb)
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(ddocs)
)
if r.status_code != 201:
raise Exception(Bulk Insert. bad status code: %d. %s, r.status_code, r.text)

1.4. Introduction to Querying

21

Documentation, Release 1.0.2

print POST /db/_bulk_docs returned {0}.format(r.status_code)


return {"docs":[]}
def create_db(dbname):
db_url = "https://{0}.cloudant.com/{1}".format(username, dbname)
r = requests.get(db_url, auth=auth, headers=headers)
if r.status_code == 200: #this db already exists
return
r = requests.put(
db_url,
auth=auth,
headers=headers
)
if r.status_code != 201:
raise Exception(Create DB. bad status code: %d. %s, r.status_code, r.text)
def run():
create_db(gameresultsdb)
num_games = 1000
checkpoint = 100
bulkdocs = {"docs":[]}
for i in range(num_games):
bulkdocs["docs"].append(generateDoc())
if len(bulkdocs[docs]) == checkpoint:
print Insert {0} documents.format(checkpoint)
bulkdocs = bulk_insert(bulkdocs) #upload in stages at each checkpoint
#upload any remaining docs
if len(bulkdocs[docs]) is not 0:
print Insert {0} documents.format(len(bulkdocs[docs]))
bulk_insert(bulkdocs)
if __name__ == __main__:
run()

As you can see, the documents in our database look something like
{
"_id":<db generated id>,
"_rev":"1-abc...",
"playername":"sabine",
"score":623,
"level":6,
"date_played":2010-09-30T01:33+00:00
}

Well now create a few MapReduce functions that will let us sort and analyze the documents using the different
keys.
2. Count Number of Plays

This first example might not be the most useful, but is, at least, instructional. This will simply count the number of times each user played the game. Were going to use a web browser to write and save our MapReduce
function into a _design document. Sign in to your Cloudant account, navigate to your Dashboard, click
22

Chapter 1. Getting Started

Documentation, Release 1.0.2

on your gameresults database, then click on the View in Futon link near the top. The URL should be
https://cloudant.com/futon/database.html?<username>%2Fgameresults.
Next, from the View pull-down menu, select Temporary View and enter the following code into the Map
function.
function(doc){
if(doc.playername)
emit(doc.playername, 1);
}

First, we want to emphasize the use of duck-typing. In the Map function, check to make sure that each document
passed in has a key called playername before we emit its value as the key. This Map function emits the keyvalue pair of doc.playername, and 1. Click on the Save As button and set the name of the _design
document to be playstats and the View to be byplayername.

In the browser you should now see the last ten results of this Map function. The key will be zelda since this is
the last name in alphanumeric order of all possible names. If you click on the key in the browser, it will take you
to the document from which that key originated.
Now, lets use the built-in _sum Reduce function to count. (We could also use _count, but since we emitted 1
for each value, the results will be the same.) Type in _sum for the Reduce function and click Save. After you
click on the reduce checkbox, you should see in your browser a list of the playername values and the number
of times that player played a game (since each document represents the result of a played game).

To get these same results with an HTTP call from curl (well leave out the Python request since you should know
how to do this) we need to set group=true when querying the View.

curl -u username https://username.cloudant.com/gameresults/_design/playstats/_view/byplayername?gr

1.4. Introduction to Querying

23

Documentation, Release 1.0.2

3. Total Score

This next MapReduce View will let you simultaneously determine the number of games played and give you the
total number of points accumulated by each user. In addition, it will let you find the number of points scored for
each level the user reached. Save the following Map and Reduce functions in a new View in the same _design
document and call it byplayername_level.
//map.js
function(doc){
if(doc.playername && !isNaN(doc.level) && !isNaN(doc.score))
emit([doc.playername, doc.level], doc.score);
}
//reduce.js
_stats

In this example we have emitted a complex key, which lets us use different group_level options in the query.
We are also using the _stats reduce function, which calculates the sum, count, min, max and sumsqr
of the values.
Count Number of Plays First, lets find the number of times each player has played the game. First with curl

curl -u username https://username.cloudant.com/gameresults/_design/playstats/_view/byplayername_le

with Python
import requests
import json
auth = (username, password)

get_url = "https://{0}.cloudant.com/gameresults/_design/playstats/_view/byplayername_level?group_l
r = requests.get(get_url, auth=auth)
for row in r.json()[rows]:
print row[key][0], row[value][count]

Get Total Score To get the total score, just look at the sum calculation in the value. In the code above change
count to sum.
Get Average Score To get the average score, change the jq filter to .value.sum/.value.count (or print
out the similar ratio in your Python script). Similarly, you can estimate the standard deviation of the scores as well
using the sumsqr, which is the sum of the square of each doc.score.
4. Total Score for Each Level

Using the group_level=2 option, we can gain some more granularity in the top scores by grouping by both
the playername and the level. Heres the curl command

curl -u username https://username.cloudant.com/gameresults/_design/playstats/_view/byplayername_le

For the Python script, you just need to change group_level=2 and modify the for-loop to display the results
since youll now get two elements in the key.
for row in r.json()[rows]:
print row[key][0], row[key][1], row[value][sum]

24

Chapter 1. Getting Started

Documentation, Release 1.0.2

5. Reduce with Value as Array

Instead of just a single number, one can emit an array of numbers in the Map function value and still use the
built-in _sum and _stats reduce function. This feature, so far, is not well documented, so lets fix that.
The _sum and _stats reduce functions work on numbers. Of course, emitting a string in the value will cause
problems when trying to calculate the sum! The reduce function will fail on the server-side. However, when
the value emitted by the Map function is an array of numbers, the _sum and _stats will make those same
calculations for each element of that array.
The modification of the Map function is to place emit [doc.score, doc.level] in the value.
//map.js
function(doc){
if(doc.playername && !isNaN(doc.level) && !isNaN(doc.score))
emit([doc.playername, doc.level], [doc.score, doc.level]);
}
//reduce.js
_stats

Save
this
MapReduce
View
in
byplayername_level_withscore_level.

the

playstats

design

document

as

Now, you get the _stats Reduce calculations made on both elements of the value array, with which you can
calculate the average level attained by each player. For example,

curl -u username https://username.cloudant.com/gameresults/_design/playstats/_view/byplayername_le

Note that were back to using group_level=1.


6. The _design Document

As a final step in this tutorial, which leads into the next section in the Getting Started series, well take a look at
the actual JSON _design document that was created above. Since its an ordinary document in the database, we
can view it with the usual tools. You can also navigate to that document in the Futon interface. Under the Views
pulldown menu in Futon, you can select Design Documents and then click on the document. It should look
similar to this

As you can see, the Views are stored in the key views. The views value contains a key for each MapReduce
View, with the name of the key being the name of the View. For each MapReduce View key, there are two
key-value pairs, one for the map and one for the reduce.

1.4. Introduction to Querying

25

Documentation, Release 1.0.2

In addition to MapReduce Views, search index functions, list functions, show functions, update functions and a
single validate_doc_update function are stored in the _design document in a similar fashion.
MapReduce View Names
One thing to think about when creating MapReduce Views in your _design documents is to give it a proper
name. The name shouldnt be too long as to be unreadable, but it can be helpful if the name contains some description of the results. In the tutorial above we roughly used the naming pattern by<key>_with<value>.
We used by because the View lets us sort by the emitted keys. This naming pattern tells us the the keys and
values emitted by the Map function. One could add some Reduce function information in the naming convention: by<key>_with<value>_reduce<function> where function could be _sum, _stats, _count
or maybe even custom in the case you write your own Reduce. A final naming suggestion, which is a little more
verbose, but hopefully not too cumbersome is
by_<key[0]-key[1]-key[2]...>_with_<value[0]-value[1]-value[2]...>_reduce_<function>

With this pattern, the View functions in our tutorial would have been
by_playername_with_1_reduce_sum
by_playername-level_with_score_reduce_stats
by_playername-level_with_score-level_reduce_stats

The naming conventions is, of course, entirely up to you.


Common Pitfalls
MapReduce or Search?

We often see developers using MapReduce Views just for their map functions in order to retrieve full documents.
This pattern has been documented, especially by the Apache CouchDB community since Apache CouchDB does
not contain any other way of indexing your documents. However, with Cloudant, you can build search indexes,
which are often the more approprate tool when you want to retrieve documents based on a multidimentional query
(a query involving multiple keys) and are not interested in the sorted order or a reduce calculation.
Query defaults: reduce=true and group=false

By default, the reduce=true and group=false options are set when you query the results of a MapReduce
View. If you query the first View we built above (playstats/byplayername) without options with curl
curl -u username https://username.cloudant.com/gameresults/_design/playstats/_view/byplayername

youll get the following results, which may not be what you expect
{
"rows":[
{"key":null,"value":1000}
]
}

To get the results that you saw in the browser, set group=true.
View Collation - the order of your results

The order of the keys returned by a query to a Cloudant MapReduce index is the same as that of Apache CouchDB.
A very good explanation of the order is found on the Apache CouchDB View Collation document.

26

Chapter 1. Getting Started

Documentation, Release 1.0.2

Often times, new users are confused about this order, which can result in confusion when a particular query doesnt
return any results. So, when youre developing and testing new Views and you get an unexpected empty query,
check to make sure the key-related query options arent excluding your expected results.
For convenience, weve cribbed two key figures from the Apache CouchDB View Collation page. This first one
gives a high-level overview of the order that keys will be sorted by when querying an index.
// special values sort before all other types
null
false
true
// then numbers
1
2
3.0
4
// then text, case sensitive
"a"
"A"
"aa"
"b"
"B"
"ba"
"bb"
// then arrays. compared element by element until different.
// Longer arrays sort after their prefixes
["a"]
["b"]
["b","c"]
["b","c", "a"]
["b","d"]
["b","d", "e"]
// then object, compares each key value in the list until different.
// larger objects sort after their subset objects.
{a:1}
{a:2}
{b:1}
{b:2}
{b:2, a:1} // Member order does matter for collation.
// CouchDB preserves member order
// but doesnt require that clients will.
// this test might fail if used with a js engine
// that doesnt preserve order
{b:2, c:2}

The second figure is the collation sequence for 7-bit ASCII characters, which can come in handy.
^ _ - , ; : ! ? . " ( ) [ ] { } @ * / \ & # % + < = > | ~ $
0 1 2 3 4 5 6 7 8 9 a A b B c C d D e E f F g G h H i I j J k K
l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z

1.4.3 _design Documents


Server-side functions in Cloudant, such as MapReduce Views functions, search index functions, update handlers,
document validation functions, list and show functions are all defined in special documents on each database,
called _design documents. A significant portion of the work building your applications interface to the database
will be implemented in these _design documents. You will want to create, typically, a few _design documents
for each database in order to organize these functions.
1.4. Introduction to Querying

27

Documentation, Release 1.0.2

The _design documents are ordinary database JSON documents with two exceptions: the "_id" of each
document begins with _design/ and there is a particular schema that must be used to hold function definitions.
One can construct a _design document by hand by writing a JSON document and uploading it to the database
in the normal way. However, the organized structure of a _design document and the awkwardness of writing an
entire javascript function inside a string has led to the construction of tools to help build _design documents.
Furthermore, by developing the _design documents locally, you can then use a proper version control system
to manage your code development.
There are a handful of applications out there that help you to do this, such as Couchapp, Erica, couchapp.js and
others.
Youll notice that the name of the tools or the documentation for these tools mention that they will help you build
a Couchapp. We may eventually cover Couchapps, but they are nothing more than HTML/CSS/Javascript files
served directly from the _attachments key of a _design document. So, we can use these tools to simply
create _design documents on your database even if we have no files in the _attachments key to serve.
For example, the Couchapp and Erica tools basically work in the same way. They map files and folders on your
local machine into a heirarchical JSON document and upload that JSON document to your database. Each file
and folder name will become the keys of the document. For files, the value of those keys are the content of the
file represented as a string. For folders, the value is another JSON object that represents the content inside the
folder. Some tools designate special file names that are not uploaded in the JSON document to the database but
are instead used for configuration puposes.
The basic steps to use these tools are
1. Create a folder on your local machine to hold your design document.
2. Inside that folder create the necessary top-level files (such as _id and .couchapprc).
3. Give your design document a name by setting the value in the _id file.
4. Create a folder called views to hold your MapReduce Views.
5. Inside views, create folders for each MapReduce View name.
6. Inside each MapReduce View folder, create two files, called map.js and reduce.js.
7. The contents of map.js and reduce.js are, of course, the javascript functions.
8. A similar pattern follows for the Search index function where the top-level directory would be called indexes (the sub-directories map to the function name, and the index.js file defines the function).
9. Within the top-level folder, execute the tools command to push the content of the folder into a _design
document on your database.
The schema of a _design document is roughly
{
"_id": <name of design doc>,
"language" : "javascript",

// optional and historical

"views" {
"<view name"> : {
"map" : "<function definition>",
"reduce" : "<"function definition>"
},
"<another view name>" : {
...
},
...
"<optional commonjs module group name for map functions>" : {
"<module name.js>":"<module definition>"
}

28

Chapter 1. Getting Started

Documentation, Release 1.0.2

},
"indexes"{
"<search index name>" {
"index":"<function definition>",
"analyzer" : "<analyzer definition>" // optional
},
"<search index name>" {
...
}
},
"lists" : {
"<list function name>" : "<function definition>",
...
},
"shows" : {
"<show function name>" : "<function definition>",
...
},
"updates":{
"<update handler function name>" : "<function definition>",
...
},
"validate_doc_update" : "<function definition>",
"<optional commonjs module group name>" : {
"<module name.js>":"<module definition>"
}

In this rough schema definition, the things in brackets (< >) are, of course, defined by you. Also, all keys/functions
in a _design document are optional you dont need to define search index functions, list functions, etc. if you
dont need them.
You can find more information about CommonJS usage on this Apache CouchDB Wiki page.
We highly recommend that you use one of the Couchapp tools to define and deploy your _design documents.
For example, there is no search index function equivalent to writing MapReduce Views in Futon as shown in the
previous tutorial. The only other way would be to write the function inside the value of a key in a JSON document,
which will probably break your eyes. Also, dont forget that you can review your _design documents in the
database via Futon, which will give you that warm fuzzy feeling when you see that your local function definitions
were uploaded successfully, follow the schema outlined above, and are emitting results.

1.4.4 Search Index


Cloudant Search allows you to index the content of any set of keys in your documents and then use a Lucene-like
API query on those indexes to retrieve them. This will allow you to do more ad-hoc and fuzzy queries over your
data. If you were planning to use a MapReduce View to primarily find your documents instead of statistically
analyzing their contents, you should probably instead be using a search index.
The Cloudant For-Developers section on Search indexes is essential reading and provides a complete guide to
building a Search index along with examples. It is enough to get you started.

1.4. Introduction to Querying

29

Documentation, Release 1.0.2

30

Chapter 1. Getting Started

CHAPTER

TWO

API REFERENCE

Cloudants database API is based on HTTP. If you know CouchDB, you should feel right at home, as Cloudants
API is very similar. To access your data on Cloudant, you connect to username.cloudant.com via HTTP or HTTPS.
For most requests, you will need to supply your user name or an API key and a password. See Authentication
Methods for details. Cloudant uses the JSON format for all documents in the database as well as for any metadata.
Thus, the request or response body of any HTTP request - unless specified otherwise - has to be a valid JSON
document. A good place to start reading about the API and its basic building blocks is the API Basics section.
This documentation is forked from the Apache Couch DB API Reference, due to the capabilities Cloudant adds
to the API. If you notice any problems with these docs, please let us know at support@cloudant.com.

2.1 API Basics


The Cloudant API is the primary method of accessing and changing data on Cloudant. Requests are made using
HTTP and requests are used to request information from the database, store new data, and perform views and
formatting of the information stored within the documents. Since Cloudant uses open, well documented standards
as the basis of its API, it is easy to access in a large number of programming languages. Have a look at this
repository for examples of accessing Cloudant in many programming languages.
Requests to the API can be categorised by the different areas of the CLoudant system that you are accessing, and
the HTTP method used to send the request. Different methods imply different operations, for example retrieval of
information from the database is typically handled by the GET operation, while updates are handled by either a
POST or PUT request. There are some differences between the information that must be supplied for the different
methods. For a guide to the basic HTTP methods and request structure, see Request Format and Responses.
For nearly all operations, the submitted data, and the returned data structure, is defined within a JavaScript Object
Notation (JSON) object. Basic information on the content and data types for JSON are provided in JSON Basics.
Errors when accessing the Cloudant API are reported using standard HTTP Status Codes. A guide to the generic
codes returned by Cloudant are provided in HTTP Status Codes.
When accessing specific areas of the Cloudant API, specific information and examples on the HTTP methods and
request, JSON structures, and error codes are provided.

2.1.1 Request Format and Responses


Cloudant supports the following HTTP request methods:
GET
Request the specified item. As with normal HTTP requests, the format of the URL defines what is returned. With Cloudant this can include static items, database documents, and configuration and statistical
information. In most cases the information is returned in the form of a JSON document.
HEAD
The HEAD method is used to get the HTTP header of a GET request without the body of the response.

31

Documentation, Release 1.0.2

POST
Upload data. Within Cloudants API, POST is used to set values, including uploading documents, setting
document values, and starting certain administration commands.
PUT
Used to put a specified resource. In Cloudants API, PUT is used to create new objects, including databases,
documents, views and design documents.
DELETE
Deletes the specified resource, including documents, views, and design documents.
COPY
A special method that can be used to copy documents and objects.
If you use an unsupported HTTP request type with a URL that does not support the specified type, a 405 error will
be returned, listing the supported HTTP methods. For example:
{
"error":"method_not_allowed",
"reason":"Only GET,HEAD allowed"
}

If the client (such as some web browsers) does not support using these HTTP methods, POST can be used instead
with the X-HTTP-Method-Override request header set to the actual HTTP method.

2.1.2 HTTP Headers


Because Cloudant uses HTTP for all external communication, you need to ensure that the correct HTTP headers
are supplied (and processed on retrieval) so that you get the right format and encoding. Different environments
and clients will be more or less strict on the effect of these HTTP headers (especially when not present). Where
possible you should be as specific as possible.
Request Headers
Content-type
Specifies the content type of the information being supplied within the request. The specification uses
MIME type specifications. For the majority of requests this will be JSON (application/json). For
some settings the MIME type will be plain text. When uploading attachments it should be the corresponding
MIME type for the attachment or binary (application/octet-stream).
The use of the Content-type on a request is highly recommended.
Accept
Specifies the list of accepted data types to be returned by the server (i.e. that are accepted/understandable
by the client). The format should be a list of one or more MIME types, separated by colons.
For the majority of requests the definition should be for JSON data (application/json). For attachments you can either specify the MIME type explicitly, or use */* to specify that all file types are supported.
If the Accept header is not supplied, then the */* MIME type is assumed (i.e. client accepts all formats).
The use of Accept in queries to Cloudant is not required, but is highly recommended as it helps to ensure
that the data returned can be processed by the client.
If you specify a data type using the Accept header, Cloudant will honor the specified type in the
Content-type header field returned. For example, if you explicitly request application/json
in the Accept of a request, the returned HTTP headers will use the value in the returned Content-type
field.
For example, when sending a request without an explicit Accept header, or when specifying */*:

32

Chapter 2. API Reference

Documentation, Release 1.0.2

GET /recipes HTTP/1.1


Host: username.cloudant.com
Accept: */*

The returned headers are:


Server: CouchDB/1.0.2 (Erlang OTP/R14B)
Date: Thu, 13 Jan 2011 13:39:34 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 227
Cache-Control: must-revalidate

Note that the returned content type is text/plain even though the information returned by the request is
in JSON format.
Explicitly specifying the Accept header:
GET /recipes HTTP/1.1
Host: username.cloudant.com
Accept: application/json

The headers returned include the application/json content type:


Server: CouchDB/1.0.2 (Erlang OTP/R14B)
Date: Thu, 13 Jan 2011 13:40:11 GMT
Content-Type: application/json
Content-Length: 227
Cache-Control: must-revalidate

If-None-Match
This header can optionally be sent to find out whether a document has been modified since it was last read
or updated. The value of the If-None-Match header should match the last Etag value received. If the
value matches the current revision of the document, the server sends a 304 Not Modified status code
and the response will not have a body. If not, you should get a normal 200 response, provided the document
still exists and no other errors occur.
Response Headers
Response headers are returned by the server when sending back content and include a number of different header
fields, many of which are standard HTTP response header and have no significance to how Cloudant operates.
The list of response headers important to Cloudant are listed below.
The Cloudant design document API and the functions when returning HTML (for example as part of a show or
list) enable you to include custom HTTP headers through the headers field of the return object.
Content-type
Specifies the MIME type of the returned data. For most request, the returned MIME type is text/plain.
All text is encoded in Unicode (UTF-8), and this is explicitly stated in the returned Content-type, as
text/plain;charset=utf-8.
Cache-control
The cache control HTTP response header provides a suggestion for client caching mechanisms on how to
treat the returned information. Cloudant typically returns the must-revalidate, which indicates that
the information should be revalidated if possible. This is used to ensure that the dynamic nature of the
content is correctly updated.
Content-length
The length (in bytes) of the returned content.
Etag

2.1. API Basics

33

Documentation, Release 1.0.2

The Etag HTTP header field is used to show the revision for a document or the response from a show
function. For documents, the value is identical to the revision of the document. The value can be used
with an If-None-Match request header to get a 304 Not Modified response if the revision is still
current.
ETags cannot currently be used with views or lists, since the ETags returned from those requests are just
random numbers that change on every request.

2.1.3 JSON Basics


The majority of requests and responses to and from Cloudant use the JavaScript Object Notation (JSON) for
formatting the content and structure of the data and responses.
JSON is used because it is the simplest and easiest to use solution for working with data within a web browser, as
JSON structures can be evaluated and used as JavaScript objects within the web browser environment. JSON also
integrates with the server-side JavaScript used within Cloudant. JSON documents are always UTF-8 encoded.
Warning: Care should be taken when comparing strings in JSON documents retrieved from Cloudant. Unicode normalization might have been applied, so that a string stored and then retrieved is not identical on a
binary level. To avoid this problem, always normalize strings before comparing them.
JSON supports the same basic types as supported by JavaScript, these are:
Number (either integer or floating-point).
String; this should be enclosed by double-quotes and supports Unicode characters and backslash escaping.
For example:
"A String"

Boolean - a true or false value. You can use these strings directly. For example:
{ "value": true}

Array - a list of values enclosed in square brackets. For example:


["one", "two", "three"]

Object - a set of key/value pairs (i.e. an associative array, or hash). The key must be a string, but the value
can be any of the supported JSON values. For example:
{
"servings" : 4,
"subtitle" : "Easy to make in advance, and then cook when ready",
"cooktime" : 60,
"title" : "Chicken Coriander"
}

In Cloudant databases, the JSON object is used to represent a variety of structures, including all documents
in a database.
Parsing JSON into a JavaScript object is supported through the JSON.parse() function in JavaScript, or
through various libraries that will perform the parsing of the content into a JavaScript object for you. Libraries for
parsing and generating JSON are available in all major programming languages.
Warning: Care should be taken to ensure that your JSON structures are valid, invalid structures will cause
Cloudant to return an HTTP status code of 400 (bad request).

34

Chapter 2. API Reference

Documentation, Release 1.0.2

2.1.4 HTTP Status Codes


With the interface to Cloudant working through HTTP, error codes and statuses are reported using a combination
of the HTTP status code number, and corresponding data in the body of the response data.
A list of the error codes returned by Cloudant and generic descriptions of the related errors are provided below. The
meaning of different status codes for specific request types are provided in the corresponding API call reference.
200 - OK
Request completed successfully.
201 - Created
Resource created successfully.
202 - Accepted
Request has been accepted, but the corresponding operation may not have completed. This is used for
background operations, such as database compaction or for bulk operations where some updates might have
led to a conflict.
304 - Not Modified
The content requested has not been modified. This is used with the ETag system to identify the version of
information returned.
400 - Bad Request
Bad request structure. The error can indicate an error with the request URL, path or headers. Differences in
the supplied MD5 hash and content also trigger this error, as this may indicate message corruption.
401 - Unauthorized
The item requested was not available using the supplied authorization, or authorization was not supplied.
403 - Forbidden
The requested item or operation is forbidden.
404 - Not Found
The requested resource could not be found. The content will include further information, as a JSON object,
if available. The structure will contain two keys, error and reason. For example:
{"error":"not_found","reason":"no_db_file"}

405 - Resource Not Allowed


A request was made using an invalid HTTP request type for the URL requested. For example, you have
requested a PUT when a POST is required. Errors of this type can also be triggered by invalid URL strings.
406 - Not Acceptable
The requested content type is not supported by the server.
409 - Conflict
Request resulted in an update conflict.
412 - Precondition Failed
The request headers from the client and the capabilities of the server do not match.
415 - Bad Content Type
The content types supported, and the content type of the information being requested or submitted indicate
that the content type is not supported.
416 - Requested Range Not Satisfiable
The range specified in the request header cannot be satisfied by the server.

2.1. API Basics

35

Documentation, Release 1.0.2

417 - Expectation Failed


When sending documents in bulk, the bulk load operation failed.
500 - Internal Server Error
The request was invalid, either because the supplied JSON was invalid, or invalid information was supplied
as part of the request.

2.2 Authentication Methods


Most requests require the credentials of a Cloudant account. There are two ways to provide those credentials.
They can either be provided using HTTP Basic Auth or as an HTTP cookie named AuthSession. The cookie can
be obtained by performing a POST request to /_session. With the cookie set, information about the logged in
user can be retrieved with a GET request and with a DELETE request you can end the session. Further details are
provided below.
Method

Path

GET

/_session

POST

/_session

DELETE

/_session

Description
Returns cookie
based login user
information
Do cookie based
user login
Logout
cookie
based user

Headers

Form Parameters

AuthSession cookie returned by POST request

Content-Type:
application/x-www-form-urlencoded

name,
password

AuthSession cookie returned by POST request

Here is an example of a post request to obtain the authentication cookie.


POST /_session HTTP/1.1
Content-Length: 32
Content-Type: application/x-www-form-urlencoded
Accept: */*
name=YourUserName&password=YourPassword

And this is the corresponding reply with the Set-Cookie header.

200 OK
Cache-Control: must-revalidate
Content-Length: 42
Content-Type: text/plain; charset=UTF-8
Date: Mon, 04 Mar 2013 14:06:11 GMT
server: CouchDB/1.0.2 (Erlang OTP/R14B)
Set-Cookie: AuthSession="a2ltc3RlYmVsOjUxMzRBQTUzOtiY2_IDUIdsTJEVNEjObAbyhrgz"; Expires=Tue, 05 Ma
x-couch-request-id: a638431d
{
"ok": true,
"name": "kimstebel",
"roles": []
}

Once you have obtained the cookie, you can make a GET request to obtain the username and its roles:

GET /_session HTTP/1.1


AuthSession: AuthSession="a2ltc3RlYmVsOjUxMzRBQTUzOtiY2_IDUIdsTJEVNEjObAbyhrgz"; Expires=Tue, 05 M
Accept: application/json

The body of the reply looks like this:

36

Chapter 2. API Reference

Documentation, Release 1.0.2

{
"ok": true,
"info": {
"authentication_db": "_users",
"authentication_handlers": ["cookie", "default"]
},
"userCtx": {
"name": null,
"roles": []
}
}

To log out, you have to send a DELETE request to the same URL and sumbit the Cookie in the request.

DELETE /_session HTTP/1.1


AuthSession: AuthSession="a2ltc3RlYmVsOjUxMzRBQTUzOtiY2_IDUIdsTJEVNEjObAbyhrgz"; Expires=Tue, 05 M
Accept: application/json

This will result in the following response.

200 OK
Cache-Control: must-revalidate
Content-Length: 12
Content-Type: application/json
Date: Mon, 04 Mar 2013 14:06:12 GMT
server: CouchDB/1.0.2 (Erlang OTP/R14B)
Set-Cookie: AuthSession=""; Expires=Fri, 02 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/; HttpOnly; Ve
x-couch-request-id: e02e0333
{
"ok": true
}

2.3 Authorization Settings


Cloudants API allows you to read and modify the permissions of each user. Users are either identified by their
Cloudant username or by their API key. You can also set permissions for unauthenticated users.
A list of the available methods and URL paths is provided below. Note that the root URL is https://cloudant.com/
rather than https://username.cloudant.com/:
Method
POST
POST

Path
Description
https://cloudant.com/api/set_permissions
Set permissions for a user and
database
https://cloudant.com/api/generate_api_key
Generate a random API key

Parameters
database, username,
roles[]

2.3.1 Setting permissions


Method: POST https://cloudant.com/api/set_permissions
Request Body: Contains parameters as url-encoded form fields
Response: JSON document indicating success or failure
Roles permitted: admin

2.3. Authorization Settings

37

Documentation, Release 1.0.2

Query Arguments
Ar- Description
gument
database
The database for which
permissions are set. This has to be
a string of the form
accountname/databasename.
username
The user name or API key for
which permissions are set
rolesThe roles the user can have. This
parameter can be passed multiple
times for each role.

Op- Type Supported Values


tional
no

string

yes

string

no

string_admin: Gives the user all permissions, including


setting permissions, _reader: Gives the user
permission to read documents from the database,
_writer: Gives the user permission to create and
modify documents in the database

Example Request
POST /api/set_permissions HTTP/1.1
Host: cloudant.com
Content-Length: 83
Content-Type: application/x-www-form-urlencoded
username=aUserNameOrApiKey&database=accountName/db&roles=_reader&roles=_writer

Example Response
{
"ok": true
}

2.3.2 Generating an API Key


Method: POST https://cloudant.com/api/generate_api_key
Request Body: Empty
Response: JSON document containing the generated key and password
Roles permitted: admin
Query Arguments: none
Structure of the JSON document returned
ok: true if request was successful
key: String containing the generated key
password: String containing the generated password
Example Request
POST /api/generate_api_key HTTP/1.1
Host: cloudant.com

38

Chapter 2. API Reference

Documentation, Release 1.0.2

Example Response
{
"password": "generatedPassword",
"ok": true,
"key": "generatedKey"
}

2.4 Databases
The database level endpoints provide an interface to entire databases within Cloudant. These are database level
rather than document level requests.
A list of the available methods and URL paths are provided below:
Method
GET
GET
PUT
DELETE
GET
POST
POST
GET

Path
/_all_dbs
/db
/db
/db
/db/_all_docs
/db/_all_docs
/db/_bulk_docs
/_db_updates

GET

/db/_changes

GET

/db/_shards

POST

/db/_missing_revs

POST

/db/_revs_diff

GET

/db/_revs_limit

PUT

/db/_revs_limit

GET
PUT
POST

/db/_security
/db/_security
/db/_view_cleanup

Description
Returns a list of all databases
Returns database information
Create a new database
Delete an existing database
Returns a built-in view of all documents in this database
Returns certain rows from the built-in view of all documents
Insert multiple documents in to the database in a single request
Returns information about databases that have been updated
Returns information about documents that have been updated in a
database
Returns information about the shards in a database or the shard a document belongs to
Given a list of document revisions, returns the document revisions that
do not exist in the database
Given a list of document revisions, returns differences between the
given revisions and ones that are in the database
Gets the limit of historical revisions to store for a single document in
the database
Sets the limit of historical revisions to store for a single document in the
database
Returns the special security object for the database
Sets the special security object for the database
Removes view files that are not used by any design document

2.4.1 Retrieving a list of all databases


Method: GET /_all_dbs
Request: None
Response: JSON list of DBs
Roles permitted: _reader
Return Codes
Code
200

Description
Request completed successfully

Returns a list of all the databases. For example:

2.4. Databases

39

Documentation, Release 1.0.2

GET http://username.cloudant.com/_all_dbs
Accept: application/json

The return is a JSON array:


[
"_users",
"contacts",
"docs",
"invoices",
"locations"
]

2.4.2 Operations on entire databases


For all the database requests, the database name within the URL path should be the database name that you wish
to perform the operation on. For example, to obtain the meta information for the database recipes, you would
use the HTTP request:
GET /recipes

For clarity, the form below is used in the URL paths:


GET /db

Where db is the name of any database.


Retrieving information about a database
Method: GET /db
Request: None
Response: Information about the database in JSON format
roles permitted: _reader, _admin
Return Codes

Code
200
404

Description
The database exists and information about it is returned.
The database could not be found. If further information is available, it will be returned as a JSON object.

Gets information about the specified database. For example, to retrieve the information for the database recipe:
GET /db HTTP/1.1
Accept: application/json

The JSON response contains meta information about the database. A sample of the JSON returned for an empty
database is provided below:
{

"update_seq": "0-g1AAAADneJzLYWBgYMlgTmFQSElKzi9KdUhJMtbLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv_
"db_name": "db",
"purge_seq": 0,
"other": {
"data_size": 0
},
"doc_del_count": 0,
"doc_count": 0,
"disk_size": 316,

40

Chapter 2. API Reference

Documentation, Release 1.0.2

"disk_format_version": 5,
"compact_running": false,
"instance_start_time": "0"
}

The elements of the returned structure are shown in the table below:
Field
Description
comSet to true if the database compaction routine is operating on this database.
pact_running
db_name
The name of the database.
disk_format_version
The version of the physical format used for the data when it is stored on disk.
disk_size
Size in bytes of the data as stored on the disk. Views indexes are not included in the
calculation.
doc_count
A count of the documents in the specified database.
doc_del_count
Number of deleted documents
inAlways 0.
stance_start_time
purge_seq
The number of purge operations on the database.
update_seq
An opaque string describing the state of the database. It should not be relied on for
counting the number of updates.
other
Json object containing a data_size field.
Creating a database
Method: PUT /db
Request: None
Response: JSON success statement
roles permitted: _admin
Code
201
202
403
412

Description
Database created successfully
The database has been successfully created on some nodes, but the number of nodes is less than the
write quorum.
Invalid database name.
Database aleady exists.

Creates a new database. The database name must be composed of one or more of the following characters:
Lowercase characters (a-z)
Name must begin with a lowercase letter
Digits (0-9)
Any of the characters _, $, (, ), +, -, and /.
Trying to create a database that does not meet these requirements will return an error quoting these restrictions.
To create the database recipes:
PUT /db HTTP/1.1
Accept: application/json

The returned content contains the JSON status:


{
"ok": true
}

Anything else should be treated as an error, and the problem should be taken from the HTTP response code.

2.4. Databases

41

Documentation, Release 1.0.2

Deleting a database
Method: DELETE /db
Request: None
Response: JSON success statement
roles permitted: _admin
Return Codes

Code
200
404

Description
Database has been deleted
The database could not be found. If further information is available, it will be returned as a JSON object.

Deletes the specified database, and all the documents and attachments contained within it.
To delete the database recipes you would send the request:
DELETE /db HTTP/1.1
Accept: application/json

If successful, the returned JSON will indicate success


{
"ok": true
}

2.4.3 Retrieving multiple documents in one request


GET /db/_all_docs
Method: GET /db/_all_docs
Request: None
Response: JSON object containing document information, ordered by the document ID
roles permitted: _reader
Query Arguments

Argument

Description

descending
Return the documents in descending by key order
endkey
Stop returning records when the specified key is reached
include_docs Include the full content of the documents in the return
inclusive_end Include rows whose key equals the endkey
key
Return only documents that match the specified key
limit
Limit the number of the returned documents to the
specified number
skip
Skip this number of records before starting to return the
results
startkey
Return records starting with the specified key

Optional
yes
yes
yes
yes
yes
yes
yes
yes

Type

Default
boolean false
string
boolean false
boolean true
string
numeric
nu0
meric
string

Returns a JSON structure of all of the documents in a given database. The information is returned as a JSON structure containing meta information about the return structure, and the list documents and basic contents, consisting
the ID, revision and key. The key is generated from the document ID.

42

Chapter 2. API Reference

Documentation, Release 1.0.2

Field
offset
rows
total_rows
update_seq

Description
Offset where the document list started
Array of document object
Number of documents in the database/view
Current update sequence for the database

Type
numeric
array
numeric
string

By default the information returned contains only the document ID and revision. For example, the request:
GET /test/_all_docs HTTP/1.1
Accept: application/json

Returns the following structure:


{
"total_rows": 3,
"offset": 0,
"rows": [{
"id": "5a049246-179f-42ad-87ac-8f080426c17c",
"key": "5a049246-179f-42ad-87ac-8f080426c17c",
"value": {
"rev": "2-9d5401898196997853b5ac4163857a29"
}
}, {
"id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"key": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"value": {
"rev": "2-ff7b85665c4c297838963c80ecf481a3"
}
}, {
"id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"key": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"value": {
"rev": "2-cbdef49ef3ddc127eff86350844a6108"
}
}]
}

The information is returned in the form of a temporary view of all the database documents, with the returned
key consisting of the ID of the document. The remainder of the interface is therefore identical to the View query
arguments and their behavior.
POST /db/_all_docs
Method: POST /db/_all_docs
Request: JSON of the document IDs you want included
Response: JSON of the returned view
roles permitted: _admin, _reader
The POST to _all_docs allows to specify multiple keys to be selected from the database. This enables you to
request multiple documents in a single request, in place of multiple Retrieving a document requests.
The request body should contain a list of the keys to be returned as an array to a keys object. For example:
POST /recipes/_all_docs
User-Agent: MyApp/0.1 libwww-perl/5.837
{
"keys" : [
"Zingylemontart",
"Yogurtraita"
]
}

2.4. Databases

43

Documentation, Release 1.0.2

The return JSON is the all documents structure, but with only the selected keys in the output:
{
"total_rows" : 2666,
"rows" : [
{
"value" : {
"rev" : "1-a3544d296de19e6f5b932ea77d886942"
},
"id" : "Zingylemontart",
"key" : "Zingylemontart"
},
{
"value" : {
"rev" : "1-91635098bfe7d40197a1b98d7ee085fc"
},
"id" : "Yogurtraita",
"key" : "Yogurtraita"
}
],
"offset" : 0
}

2.4.4 Creating or updating multiple documents


Method: POST /db/_bulk_docs
Request: JSON of the docs and updates to be applied
Response: JSON success statement
roles permitted: _admin, _writer
Return Codes
Code
201
202

Description
All documents have been created or updated.
For at least one document, the write quorum (specified by w) has not been met.

The bulk document API allows you to create and update multiple documents at the same time within a single
request. The basic operation is similar to creating or updating a single document, except that you batch the
document structure and information. When creating new documents the document ID is optional. For updating
existing documents, you must provide the document ID, revision information, and new document values.
For both inserts and updates the basic structure of the JSON document in the request is the same:
Request Body
Field
docs

Description
Bulk Documents Document

Type
array of objects

Optional
no

Object in docs array


Field
_id
_rev
_deleted

44

Description
Document ID
Document revision
Whether the document should be deleted

Type
string
string
boolean

Optional
yes, but mandatory for updates
yes, but mandatory for updates
yes

Chapter 2. API Reference

Documentation, Release 1.0.2

Inserting Documents in Bulk


To insert documents in bulk into a database you need to supply a JSON structure with the array of documents that
you want to add to the database. You can either include a document ID for each document, or allow the document
ID to be automatically generated.
For example, the following inserts three new documents with the supplied document IDs. If you omit the document
ID, it will be generated:
{
"docs": [{
"name": "Nicholas",
"age": 45,
"gender": "male",
"_id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"_attachments": {
}
}, {
"name": "Taylor",
"age": 50,
"gender": "male",
"_id": "5a049246-179f-42ad-87ac-8f080426c17c",
"_attachments": {
}
}, {
"name": "Owen",
"age": 51,
"gender": "male",
"_id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"_attachments": {
}
}]
}

The return type from a bulk insertion will be 201, with the content of the returned structure indicating specific
success or otherwise messages on a per-document basis.
The return structure from the example above contains a list of the documents created, here with the combination
and their revision IDs:
201 Created
Cache-Control: must-revalidate
Content-Length: 269
Content-Type: application/json
Date: Mon, 04 Mar 2013 14:06:20 GMT
server: CouchDB/1.0.2 (Erlang OTP/R14B)
x-couch-request-id: e8ff64d5
[{
"id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"rev": "1-54dd23d6a630d0d75c2c5d4ef894454e"
}, {
"id": "5a049246-179f-42ad-87ac-8f080426c17c",
"rev": "1-0cde94a828df5cdc0943a10f3f36e7e5"
}, {
"id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"rev": "1-a2b6e5dac4e0447e7049c8c540b309d6"
}]

The content and structure of the returned JSON will depend on the transaction semantics being used for the bulk
update; see Bulk Documents Transaction Semantics for more information. Conflicts and validation errors when
2.4. Databases

45

Documentation, Release 1.0.2

updating documents in bulk must be handled separately; see Bulk Document Validation and Conflict Errors.
Updating Documents in Bulk

The bulk document update procedure is similar to the insertion procedure, except that you must specify the document ID and current revision for every document in the bulk update JSON string.
For example, you could send the following request:
POST /test/_bulk_docs HTTP/1.1
Accept: application/json
{
"docs": [{
"name": "Nicholas",
"age": 45,
"gender": "female",
"_id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"_attachments": {
},
"_rev": "1-54dd23d6a630d0d75c2c5d4ef894454e"
}, {
"name": "Taylor",
"age": 50,
"gender": "female",
"_id": "5a049246-179f-42ad-87ac-8f080426c17c",
"_attachments": {
},
"_rev": "1-0cde94a828df5cdc0943a10f3f36e7e5"
}, {
"name": "Owen",
"age": 51,
"gender": "female",
"_id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"_attachments": {
},
"_rev": "1-a2b6e5dac4e0447e7049c8c540b309d6"
}]
}

The return structure is the JSON of the updated documents, with the new revision and ID information:
[{
"id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"rev": "2-ff7b85665c4c297838963c80ecf481a3"
}, {
"id": "5a049246-179f-42ad-87ac-8f080426c17c",
"rev": "2-9d5401898196997853b5ac4163857a29"
}, {
"id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"rev": "2-cbdef49ef3ddc127eff86350844a6108"
}]

You can optionally delete documents during a bulk update by adding the _deleted field with a value of true
to each document ID/revision combination within the submitted JSON structure.
The return type from a bulk insertion will be 201, with the content of the returned structure indicating specific
success or otherwise messages on a per-document basis.
The content and structure of the returned JSON will depend on the transaction semantics being used for the bulk

46

Chapter 2. API Reference

Documentation, Release 1.0.2

update; see Bulk Documents Transaction Semantics for more information. Conflicts and validation errors when
updating documents in bulk must be handled separately; see Bulk Document Validation and Conflict Errors.
Bulk Documents Transaction Semantics

Cloudant will only guarantee that some of the documents will be saved if your request yields a 202
response. The response will contain the list of documents successfully inserted or updated during the
process.
The response structure will indicate whether the document was updated by supplying the new _rev
parameter indicating a new document revision was created. If the update failed, then you will get an
error of type conflict. For example:
[
{
"id" : "FishStew",
"error" : "conflict",
"reason" : "Document update conflict."
},
{
"id" : "LambStew",
"error" : "conflict",
"reason" : "Document update conflict."
},
{
"id" : "7f7638c86173eb440b8890839ff35433",
"error" : "conflict",
"reason" : "Document update conflict."
}
]

In this case no new revision has been created and you will need to submit the document update with
the correct revision tag, to update the document.
Bulk Document Validation and Conflict Errors

The JSON returned by the _bulk_docs operation consists of an array of JSON structures, one for each document
in the original submission. The returned JSON structure should be examined to ensure that all of the documents
submitted in the original request were successfully added to the database.
The structure of the returned information is:
Field
docs [array]

Description
Bulk Documents Document

Fields of objects in docs array

Field
id
error
reason

Type
array of objects
Description
Document ID
Error type
Error string with extended reason

Type
string
string
string

When a document (or document revision) is not correctly committed to the database because of an error, you
should check the error field to determine error type and course of action. Errors will be one of the following
type:
conflict
The document as submitted is in conflict. If you used the default bulk transaction mode then the new revision
will not have been created and you will need to re-submit the document to the database.
Conflict resolution of documents added using the bulk docs interface is identical to the resolution procedures
used when resolving conflict errors during replication.

2.4. Databases

47

Documentation, Release 1.0.2

forbidden
Entries with this error type indicate that the validation routine applied to the document during submission
has returned an error.
For example, if your validation routine includes the following:
throw({forbidden: invalid recipe ingredient});

The error returned will be:


{
"id" : "7f7638c86173eb440b8890839ff35433",
"error" : "forbidden",
"reason" : "invalid recipe ingredient"
}

2.4.5 Obtaining a list of updated databases


Method: GET /_db_updates
Request body: None
Response body: one or more JSON documents depending on feed parameter
Roles permitted: _admin
Please note that this feature is not enabled yet for most customers!
Query Arguments
ArDescription
gument
feed Type of feed

Op- Type De- Supported Values


tional
fault

heartbeat
Time in milliseconds after which an empty line
is sent during longpoll or continuous if there
have been no changes
limitMaximum number of results to return

yes

sinceStart the results from changes immediately after


the specified sequence number. If since is 0
(the default), the request will return all changes
from activation of the feature.
timeout
Number of milliseconds to wait for data in a
longpoll or continuous feed before
terminating the response. If both heartbeat
and timeout are suppled, heartbeat
supersedes timeout.
descending
Whether results should be returned in
descending order, i.e. the latest event first. By
default, the oldest event is returned first.

yes

yes

yes

stringnor- continuous: Continuous


mal (non-polling) mode,
longpoll: Long polling mode,
normal (default) polling mode
nu- 60000
meric
nu- none
meric
string0

yes

numeric

yes

boolean
false

Response Headers
Independent of the feed parameter, the changes feed always uses Transfer-Encoding: Chunked for all
its responses. This means that the response does not have a Content-Size header. Instead, the body contains
48

Chapter 2. API Reference

Documentation, Release 1.0.2

the sizes of each chunk (There will be one chunk for each update). Most http client libraries are able to decode
these responses so that there is no difference from the perspective of the application. However, some libraries
might require manual processing of the chunks. See RFC 2616 for more information.
Description
Obtains a list of changes to databases. Changes can be either updates to the database, creation, or deletion of
a database. This can be used to monitor for updates and modifications to the database for post processing or
synchronization across databases. It is most useful in applications that use many small databases, so that the
application does not need to keep changes feeds open for each database. The feed is not guaranteed to return
changes in the correct order and might contain changes more than once. In rare cases, changes might even be
skipped. There are three kinds of feeds: polling, long polling, and continuous. All requests are polling request by
default. You can select any feed type explicitly using the feed query argument.
Polling

If you do not set the feed parameter, you will get all changes that have occurred until now. Once they have been
sent, the http connection will be closed and another request can be made later to get more changes. This type
of request returns a single JSON document containing information about updates to databases. For example, the
query
GET /_db_updates

Will get all of the changes to all databases. You can request a starting point using the since query parameter and
specifying the sequence number. You will need to record the latest sequence number in your client and then use
this when making another request as the new value to the since parameter.
Longpoll

With long polling the request to the server will remain open until a change is made on the database, when the
changes will be reported, and then the connection will close. The long poll is useful when you want to monitor
for changes for a specific purpose without wanting to monitor continuously.
Because the wait for a change can be significant you can set a timeout before the connection is automatically
closed (the timeout parameter). You can also set a heartbeat interval (using the heartbeat query parameter),
which let the server send a newline to keep the connection open.
The following example request...

GET /_db_updates?feed=longpoll&since=672-g1AAAAHeeJzLYWBg4MhgTmFQSElKzi9KdUhJMtIrSS0uqTQwMNNLzskvT

yields this response:


HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Type: text/plain; charset=UTF-8
Date: Thu, 05 Dec 2013 10:08:09 GMT
Server: CouchDB/1.0.2 (Erlang OTP/R14B)
Transfer-Encoding: chunked
X-Couch-Request-ID: c4c0a8ed
{

"results": [{
"dbname": "documentationchanges1documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "created",
"account": "testy006-admin",
"seq": "673-g1AAAAJAeJyN0EtuwjAQgGGXViq3KCzYsIjsxI9k1UgcpPV4jBBKEwnCghXcpL1JuUl7k-BHVAkWaTZjyR
}],
"last_seq": "673-g1AAAAJAeJyN0EtuwjAQgGGXViq3KCzYsIjsxI9k1UgcpPV4jBBKEwnCghXcpL1JuUl7k-BHVAkWaTZ
}

2.4. Databases

49

Documentation, Release 1.0.2

Continuous

Continuous sends all new changes back to the client immediately, without closing the connection. In continuous
mode the format of the changes is slightly different to accommodate the continuous nature while ensuring that the
JSON output is still valid for each change notification.
As with the longpoll feed type you can set both the timeout and heartbeat intervals to ensure that the connection
is kept open for new changes and updates.
The return structure for normal and longpoll modes is a JSON array of changes objects, and the last update
sequence number.
results: Array of changes
dbname: name of the database that changed
type: type of change, created, updated, or deleted
seq: sequence number of the change. Sequence numbers are non-contiguous.
last_seq: sequence number of the last change
In continuous mode, the server sends a CRLF (carriage-return, linefeed) delimited line for each change. Each
line contains the JSON object with the same structure as the object inside the results array of other feed types.
The following example request...

GET /_db_updates?feed=continuous&since=665-g1AAAAHeeJzLYWBg4MhgTmFQSElKzi9KdUhJMtErSS0uqTQwMNNLzsk

yields this response:


HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Type: text/plain; charset=UTF-8
Date: Thu, 05 Dec 2013 10:08:08 GMT
Server: CouchDB/1.0.2 (Erlang OTP/R14B)
Transfer-Encoding: chunked
X-Couch-Request-ID: fe9a807b
{

"dbname": "documentationchangescontinuous1documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "created",
"account": "testy006-admin",
"seq": "666-g1AAAAJAeJyN0EkKwjAUgOE4gN5CxZWbksakSVcWL6IZEakVtC5c6U30JnoTvUnNIEJdVDcvEMLHy58DAPqr
}
{

"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "667-g1AAAAJAeJyN0EkKwjAUgOE4gN5CBTduShqTJl1ZvIhmRKRW0LpwpTfRm-hN9CY1gwh1Ud28QAgfL38OAOiv
}
{

"dbname": "documentationchangescontinuous2documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "created",
"account": "testy006-admin",
"seq": "668-g1AAAAJAeJyN0EuqwjAUgOH4AN2FCk6c1DQmTTqyuBHNE5FaQevAke5Ed6I70Z3UPEToHfQ6OYEQPk7-HADQ
}
{

"dbname": "documentationchangescontinuous1documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "deleted",
"account": "testy006-admin",
"seq": "669-g1AAAAJAeJyN0EuqwjAUgOH4AN2FCo4c1DQmTTqyuBHNE5FaQevAke5Ed6I70Z3UPEToHfQ6OYEQPk7-HADQ
}

50

Chapter 2. API Reference

Documentation, Release 1.0.2

"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "670-g1AAAAJAeJyN0EuqwjAUgOH4AN2FOnDioKYxadKRxY1onojUCloHjnQnuhPdie6k5iFC76DXyQmE8HHy5wCA
}
{

"dbname": "documentationchangescontinuous2documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "deleted",
"account": "testy006-admin",
"seq": "671-g1AAAAJAeJyN0EuqwjAUgOH4gOsu1IETByWNSZOOLG7k3jwRqRW0DhzpTnQnuhPdSc1DLtZB7eQEwuEj-XMA
}
{

"dbname": "documentationchangescontinuous2documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "deleted",
"account": "testy006-admin",
"seq": "672-g1AAAAJAeJyN0DsKwjAYwPH4AL2FOrg4lDQmTTpZvIjmiUitoHVw0pvoTfQmepOahwh1qC5fIIQfX_45AKC_
}
{

"dbname": "documentationchanges1documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "created",
"account": "testy006-admin",
"seq": "673-g1AAAAJAeJyN0DsKwjAYwPH4AL2FOrg4lDQmTTpZ8CCaJyK1gtbBSW-iN9Gb6E1qHiLUobp8gRB-fPnnAID}
{

"dbname": "documentationchanges2documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "created",
"account": "testy006-admin",
"seq": "674-g1AAAAJAeJyN0EsOATEYwPF6JNwCCxuLSafaaWeFOAh9RmSMhLGw4ibchJtwk9GHSMZi2HxNmuaXr_8MANBd
}
{

"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "675-g1AAAAJAeJyN0EsOATEYwPF6JNwCGwuLSafaaWeFOAh9RmSMhLGw4ibchJtwk9GHSMZi2HxNmuaXr_8MANBd
}
{

"dbname": "documentationchanges1documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "deleted",
"account": "testy006-admin",
"seq": "676-g1AAAAJAeJyN0EsOATEYwPF6JNwCGwuLSafaaWeFOAh9RmSMhLGw4ibchJtwk9GHSMZi2HxNmuaXr_8MANBd
}
{

"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "677-g1AAAAJAeJyN0EsOATEYwPF6JNwCK4nFpFPttLNCHIQ-IzJGwlhYcRNuwk24yehDJGMxbL4mTfPL138GAOiu
}
{

"dbname": "documentationchanges2documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "deleted",
"account": "testy006-admin",
"seq": "678-g1AAAAJAeJyN0EsOATEYwPF6JNwCKwvJpFPttLNCHIQ-IzJGwlhYcRNuwk24yehDJGMxbL4mTfPL138GAOiu
}
{

"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "679-g1AAAAJAeJyN0EkKwjAUgOE4gN5C3QlCSWPSpCsVD6IZEakVtC5c6U30JnoTvUnNIEJdVDcvEMLHy58BALqr
}

2.4. Databases

51

Documentation, Release 1.0.2

2.4.6 Obtaining a list of changes


Method: GET /db/_changes
Request: None
Response: JSON success statement
Roles permitted: _admin, _reader
Query Arguments
ArDescription
gument
doc_ids
List of documents IDs to use to filter updates

feed

Type of feed

Op- Type De- Supported Values


tional
fault
yes

yes

filterName of filter function from a design document to


get updates
heartbeat
Time in milliseconds after which an empty line is
sent during longpoll or continuous if there have
been no changes
include_docs
Include the document with the result
limit Maximum number of rows to return

yes

since Start the results from changes immediately after


the specified sequence number. If since is 0 (the
default), the request will return all changes from
the creation of the database.
descending
Return the changes in descending (by seq) order
timeout
Number of milliseconds to wait for data in a
longpoll or continuous feed before
terminating the response. If both heartbeat and
timeout are suppled, heartbeat supersedes
timeout.

yes

array
of
strings
string nor- continuous: Continuous
mal mode, longpoll: Long
polling mode, normal:
Polling mode
string

yes

nu60000
meric

yes
yes

booleanfalse
nunone
meric
string 0

yes
yes

booleanfalse
numeric

Obtains a list of the changes made to the database. This can be used to monitor for update and modifications to
the database for post processing or synchronization. The _changes feed is not guaranteed to return changes in
the correct order. There are three different types of supported changes feeds, poll, longpoll, and continuous. All
requests are poll requests by default. You can select any feed type explicitly using the feed query argument.
Polling

With polling you can request the changes that have occured since a specific sequence number. This returns the
JSON structure containing the changed document information. When you perform a poll change request, only the
changes since the specific sequence number are returned. For example, the query
GET /recipes/_changes
Content-Type: application/json

Will get all of the changes in the database. You can request a starting point using the since query argument and
specifying the sequence number. You will need to record the latest sequence number in your client and then use
this when making another request as the new value to the since parameter.

52

Chapter 2. API Reference

Documentation, Release 1.0.2

Longpoll

With long polling the request to the server will remain open until a change is made on the database, when the
changes will be reported, and then the connection will close. The long poll is useful when you want to monitor
for changes for a specific purpose without wanting to monitoring continuously for changes.
Because the wait for a change can be significant you can set a timeout before the connection is automatically
closed (the timeout argument). You can also set a heartbeat interval (using the heartbeat query argument),
which sends a newline to keep the connection open.
The return structure for normal and longpoll modes is a JSON array of changes objects, and the last update
sequence number. The response is structured as follows:
last_seq: Last change sequence string
pending: Number of changes after the ones in this response
results: Array of changes made to a database
changes: Array of changes, field-by-field, for this document
id: Document ID
seq: Update sequence string
Eexample request and response

GET /db/_changes?feed=longpoll&since=0-g1AAAAI9eJyV0F8KgjAcwPFRQd0iu4BsTZ17ypvU_iJiCrUeeqqb1E3qJnU
{

"results": [{
"seq": "1-g1AAAAI9eJyV0EsKwjAUBdD4Ad2FdQMlMW3TjOxONF9KqS1oHDjSnehOdCe6k5oQsNZBqZP3HiEcLrcEAMzz
"id": "foo",
"changes": [{
"rev": "1-967a00dff5e02add41819138abb3284d"
}]
}],
"last_seq": "1-g1AAAAI9eJyV0EsKwjAUBdD4Ad2FdQMlMW3TjOxONF9KqS1oHDjSnehOdCe6k5oQsNZBqZP3HiEcLrcEA
"pending": 0
}

Continuous

Continuous sends all new changes back to the client immediately, without closing the connection. In continuous
mode the format of the changes is slightly different to accommodate the continuous nature while ensuring that the
JSON output is still valid for each change notification.
As with the longpoll feed type you can set both the timeout and heartbeat intervals to ensure that the connection
is kept open for new changes and updates.
In continuous mode, the server sends a CRLF (carriage-return, linefeed) delimited line for each change. Each
line contains the JSON object.
Example request and response

GET /db/_changes?feed=continuous&since=0-g1AAAAI7eJyN0EEOgjAQBdBGTfQWcgLSVtriSm6iTDuEGIRE68KV3kRvo
{

"seq": "1-g1AAAAI7eJyN0EsOgjAQBuD6SPQWcgLSIm1xJTdRph1CCEKiuHClN9Gb6E30JlisCXaDbGYmk8mXyV8QQubZRB
"id": "2documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "1-967a00dff5e02add41819138abb3284d"
}]
}
{

2.4. Databases

53

Documentation, Release 1.0.2

"seq": "2-g1AAAAI7eJyN0E0OgjAQBeD6k-gt5ASkRdriSm6iTDuEEIREceFKb6I30ZvoTbBYE-wG2cxMmubLyysIIfNsoo
"id": "1documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "1-967a00dff5e02add41819138abb3284d"
}]
}
{

"seq": "3-g1AAAAI7eJyN0EsOgjAQBuD6SPQWcgLSIqW4kpso0w4hBCFRXLjSm-hN9CZ6EyyUBLtBNjOTyeTL5M8JIct0po
"id": "1documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "2-eec205a9d413992850a6e32678485900"
}],
"deleted": true
}
{

"seq": "4-g1AAAAI7eJyN0EEOgjAQBdAGTfQWcgLSIm1xJTdRph1CCEKiuHClN9Gb6E30JlisCXaDbGYmTfPy80tCyDyfaO
"id": "2documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "2-eec205a9d413992850a6e32678485900"
}],
"deleted": true
}

You can also request the full contents of each document change (instead of just the change notification) by using
the include_docs parameter.
Filtering
You can filter the contents of the changes feed in a number of ways. The most basic way is to specify one or more
document IDs to the query. This causes the returned structure value to only contain changes for the specified IDs.
Note that the value of this query argument should be a JSON formatted array.
You can also filter the _changes feed by defining a filter function within a design document. The specification
for the filter is the same as for replication filters. You specify the name of the filter function to the filter
parameter, specifying the design document name and filter name. For example:
GET /db/_changes?filter=design_doc/filtername

The _changes feed can be used to watch changes to specific document IDs or the list of _design documents
in a database. If the filters parameter is set to _doc_ids a list of doc IDs can be passed in the doc_ids
parameter as a JSON array.

2.4.7 Retrieving information about shards in a database


Method: GET
Path: /db/_shards
Response: JSON document describing shards of the database
Roles permitted: _admin
Response structure
shards: Object describing the shards in the database. The field names of the object are hash value ranges.
range: Array of node names (strings), which have this shard.

54

Chapter 2. API Reference

Documentation, Release 1.0.2

Example request and response


GET /db/_shards HTTP/1.1
Accept: application/json
{
"shards": {
"00000000-1fffffff":
"20000000-3fffffff":
"40000000-5fffffff":
"60000000-7fffffff":
"80000000-9fffffff":
"a0000000-bfffffff":
"c0000000-dfffffff":
"e0000000-ffffffff":
}

["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",

"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",

2.4.8 Retrieving the shard a document belongs to


Method: GET
Path: /db/_shards/id
Response: JSON document
Roles permitted: _admin
Response structure
range: The hash range of the shard
nodes: Array of node names of this shard
Example request and response
GET /db/_shards/foo HTTP/1.1
Accept: application/json
{

"range": "80000000-9fffffff",
"nodes": ["dbcore@db1.testy004.cloudant.net", "dbcore@db2.testy004.cloudant.net", "dbcore@db3.te
}

2.4.9 Cleaning up cached view output


Method: POST /db/_view_cleanup
Response: JSON success statement
Roles permitted: _admin
Cleans up the cached view output on disk for a given view. For example:
POST /recipes/_view_cleanup
Content-Type: application/json

If the request is successful, a basic status message us returned:

2.4. Databases

55

Documentation, Release 1.0.2

{
"ok" : true
}

2.4.10 Retrieving missing revisions


Method: POST /db/_missing_revs
Request: JSON list of document revisions
Response: JSON of missing revisions

2.4.11 Retrieving differences between revisions


Method: POST /db/_revs_diff
Request: JSON list of document revisions
Response: JSON list of differences from supplied document/revision list

2.4.12 The database security document


Retrieving the security document
Method: GET /db/_security
Request: None
Response: JSON of the security object
Gets the current security object from the specified database. The security object consists of two compulsory
elements, admins and readers, which are used to specify the list of users and/or roles that have admin and
reader rights to the database respectively. Any additional fields in the security object are optional. The entire
security object is made available to validation and other internal functions so that the database can control and
limit functionality.
To get the existing security object you would send the following request:
{
"admins" : {
"roles" : [],
"names" : [
"mc",
"slp"
]
},
"readers" : {
"roles" : [],
"names" : [
"tim",
"brian"
]
}
}

Security object structure is:


admins: Roles/Users with admin privileges
roles [array]: List of roles with parent privilege
users [array]: List of users with parent privilege

56

Chapter 2. API Reference

Documentation, Release 1.0.2

readers: Roles/Users with reader privileges


roles [array]: List of roles with parent privilege
users [array]: List of users with parent privilege
Note: If the security object for a database has never been set, then the value returned will be empty.

Creating or updating the security document


Method: PUT /db/_security
Request: JSON specifying the admin and user security for the database
Response: JSON status message
Sets the security object for the given database. For example, to set the security object for the recipes database:
PUT http://username.cloudant.com/recipes/_security
Content-Type: application/json
{
"admins" : {
"roles" : [],
"names" : [
"mc",
"slp"
]
},
"readers" : {
"roles" : [],
"names" : [
"tim",
"brian"
]
}
}

If the setting was successful, a JSON status object will be returned:


{
"ok" : true
}

2.4.13 The revisions limit


Retrieving the revisions limit
Method: GET /db/_revs_limit
Request: None
Response: The current revision limit setting
Roles permitted: _admin, _reader
Gets the current revs_limit (revision limit) setting.
For example to get the current limit:
GET /recipes/_revs_limit
Content-Type: application/json

The returned information is the current setting as a numerical scalar:

2.4. Databases

57

Documentation, Release 1.0.2

1000

Setting the revisions limit


Method: PUT /db/_revs_limit
Request: A scalar integer of the revision limit setting
Response: Confirmation of setting of the revision limit
Roles permitted: _admin, _writer
Sets the maximum number of document revisions that will be tracked even after compaction has occurred. You
can set the revision limit on a database by using PUT with a scalar integer of the limit that you want to set as the
request body.
For example to set the revs limit to 100 for the recipes database:
PUT /recipes/_revs_limit
Content-Type: application/json
100

If the setting was successful, a JSON status object will be returned:


{
"ok" : true
}

2.5 Documents
The document endpoints can be used to create, read, update and delete documents within a database.
A list of the available methods and URL paths is provided below:
Method
POST
GET
HEAD
PUT
DELETE
COPY
GET
PUT
DELETE

Path
/db
/db/doc
/db/doc
/db/doc
/db/doc
/db/doc
/db/doc/attachment
/db/doc/attachment
/db/doc/attachment

Description
Create a new document
Returns the latest revision of the document
Returns bare information in the HTTP Headers for the document
Inserts a new document, or new version of an existing document
Deletes the document
Copies the document
Gets the attachment of a document
Adds an attachment of a document
Deletes an attachment of a document

2.5.1 CRUD operations on documents


Creating a new document
Method: POST /db
Request: JSON of the new document
Response: JSON with the committed document information
Roles permitted: _writer

58

Chapter 2. API Reference

Documentation, Release 1.0.2

Query Arguments

Argument
batch

Description
Allow document store request to be batched with others

Optional
yes

Type
string

Supported Values
ok: enable batching

Return Codes

Code
201
409

Description
Document has been created successfully
Conflict - a document with the specified document ID already exists

Response Headers

Field
ETAG

Description
Revision of the document, Same as the _rev field.

Create a new document in the specified database, using the supplied JSON document structure. If the JSON
structure includes the _id field, then the document will be created with the specified document ID. If the _id
field is not specified, a new unique ID will be generated.
For example, you can generate a new document with a generated UUID using the following request:
POST /recipes/
Content-Type: application/json
{
"servings" : 4,
"subtitle" : "Delicious with fresh bread",
"title" : "Fish Stew"
}

The returned JSON will specify the automatically generated ID and revision information:
{
"id" : "64575eef70ab90a2b8d55fc09e00440d",
"ok" : true,
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}

The document id is guaranteed to be unique per database.


Specifying the Document ID

The document ID can be specified by including the _id field in the JSON of the submitted record. The following
request will create the same document with the ID FishStew:
POST /recipes/
Content-Type: application/json
{
"_id" : "FishStew",
"servings" : 4,
"subtitle" : "Delicious with fresh bread",
"title" : "Fish Stew"
}

The structure of the submitted document is as shown in the table below:


In either case, the returned JSON will specify the document ID, revision ID, and status message:

2.5. Documents

59

Documentation, Release 1.0.2

{
"id" : "FishStew",
"ok" : true,
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}

If a document with the given id already exists, a 409 conflict response will be returned.
Batch Mode Writes

You can write documents to the database at a higher rate by using the batch option. This collects document writes
together in memory (on a user-by-user basis) before they are committed to disk. This increases the risk of the
documents not being stored in the event of a failure, since the documents are not written to disk immediately.
To use the batched mode, append the batch=ok query argument to the URL of the PUT or POST request. The
server will respond with a 202 HTTP response code immediately.
Including Attachments

You can include one or more attachments with a given document by incorporating the attachment information
within the JSON of the document. This provides a simpler alternative to loading documents with attachments than
making a separate call (see Creating or updating an attachment).
_id (optional): Document ID
_rev (optional): Revision ID (when updating an existing document)
_attachments (optional): Document Attachment
filename: Attachment information
* content_type: MIME Content type string
* data: File attachment content, Base64 encoded
The filename will be the attachment name. For example, when sending the JSON structure below:
{
"_id" : "FishStew",
"servings" : 4,
"subtitle" : "Delicious with fresh bread",
"title" : "Fish Stew"
"_attachments" : {
"styling.css" : {
"content-type" : "text/css",
"data" : "cCB7IGZvbnQtc2l6ZTogMTJwdDsgfQo=",
},
},
}

The attachment styling.css can be accessed using /recipes/FishStew/styling.css. For more


information on attachments, see Attachments.
The document data embedded into the structure must be encoded using base64.
Retrieving a document
Method: GET /db/doc
Request: None
Response: Returns the JSON for the document

60

Chapter 2. API Reference

Documentation, Release 1.0.2

Roles permitted: _reader


HTTP Headers

You can use the If-None-Match header to retrieve the document only if it has been modified. See HTTP
basics.
Query Arguments

ArguDescription
ment
conflictsReturns the conflict tree for the document.

Optional
yes

Type

Specify the revision to return


Return a list of the revisions for the
document
revs_infoReturn a list of detailed revision
information for the document

yes
yes

string
boolean

yes

boolean false

rev
revs

Default
boolean false

Supported Values
true: Includes
conflicting revisions

true: Includes the


revisions

Return Codes

Code
200
304
400
404

Description
Document retrieved
See HTTP basics
The format of the request or revision was invalid
The specified document or revision cannot be found, or has been deleted

Returns the specified doc from the specified db. For example, to retrieve the document with the id DocID you
would send the following request:
GET /db/DocID HTTP/1.1
Accept: application/json

The returned JSON is the JSON of the document, including the document ID and revision number:
{
"_id": "DocID",
"_rev": "1-2b458b0705e3007bce80b0499a1199e7",
"name": "Anna",
"age": 89,
"gender": "female"
}

Unless you request a specific revision, the latest revision of the document will always be returned.
Attachments

If the document includes attachments, then the returned structure will contain a summary of the attachments
associated with the document, but not the attachment data itself.
The JSON for the returned document will include the _attachments field, with one or more attachment definitions. For example:
{
"_id": "DocID",
"_rev": "2-f29c836d0bedc4b4b95cfaa6d99e95df",
"name": "Anna",

2.5. Documents

61

Documentation, Release 1.0.2

"age": 89,
"gender": "female",
"_attachments": {
"my attachment": {
"content_type": "application/json; charset=UTF-8",
"revpos": 2,
"digest": "md5-37IZysiyWLRWx31J/1WQHw==",
"length": 12,
"stub": true
}
}
}

The format of the returned JSON is shown in the table below:


_id (optional): Document ID
_rev (optional): Revision ID (when updating an existing document)
_attachments (optional): Document Attachment
filename: Attachment information
* content_type: MIME Content type string
* length: Length (bytes) of the attachment data
* revpos: Revision where this attachment exists
* digest: MD5 checksum of the attachment
* stub: Indicates whether the attachment is a stub
Getting a List of Revisions

You can obtain a list of the revisions for a given document by adding the revs=true parameter to the request
URL. For example:
GET /recipes/FishStew?revs=true
Accept: application/json

The returned JSON structure includes the original document, including a _revisions structure that includes
the revision information:
{
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"_id" : "FishStew",
"title" : "Irish Fish Stew",
"_revisions" : {
"ids" : [
"a1a9b39ee3cc39181b796a69cb48521c",
"7c4740b4dcf26683e941d6641c00c39d",
"9c65296036141e575d32ba9c034dd3ee"
],
"start" : 3
},
"_rev" : "3-a1a9b39ee3cc39181b796a69cb48521c"
}

_id (optional): Document ID


_rev (optional): Revision ID (when updating an existing document)
_revisions: Document Revisions

62

Chapter 2. API Reference

Documentation, Release 1.0.2

ids [array]: Array of valid revision IDs, in reverse order (latest first)
start: Prefix number for the latest revision
Obtaining an Extended Revision History

You can get additional information about the revisions for a given document by supplying the revs_info argument to the query:
GET /recipes/FishStew?revs_info=true
Accept: application/json

This returns extended revision information, including the availability and status of each revision:
{
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"_id" : "FishStew",
"_revs_info" : [
{
"status" : "available",
"rev" : "3-a1a9b39ee3cc39181b796a69cb48521c"
},
{
"status" : "available",
"rev" : "2-7c4740b4dcf26683e941d6641c00c39d"
},
{
"status" : "available",
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}
],
"title" : "Irish Fish Stew",
"_rev" : "3-a1a9b39ee3cc39181b796a69cb48521c"
}

_id (optional): Document ID


_rev (optional): Revision ID (when updating an existing document)
_revs_info [array]: Document Extended Revision Info
rev: Full revision string
status: Status of the revision
Obtaining a Specific Revision

To get a specific revision, add the rev argument to the request, and specify the full revision number:
GET /recipes/FishStew?rev=2-7c4740b4dcf26683e941d6641c00c39d
Accept: application/json

The specified revision of the document will be returned, including a _rev field specifying the revision that was
requested:
{
"_id" : "FishStew",
"_rev" : "2-7c4740b4dcf26683e941d6641c00c39d",
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"title" : "Fish Stew"
}

2.5. Documents

63

Documentation, Release 1.0.2

Retrieving conflicting revisions

To get a list of conflicting revisions, set the conflicts argument to true.


GET /recipes/FishStew?conflicts=true
Accept: application/json

If there are conflicts, the returned document will include a _conflicts field specifying the revisions that are in
conflict.
{
"_id" : "FishStew",
"_rev" : "2-7c4740b4dcf26683e941d6641c00c39d",
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"title" : "Fish Stew",
"_conflicts": ["2-65db2a11b5172bf928e3bcf59f728970","2-5bc3c6319edf62d4c624277fdd0ae191"]
}

Overriding the default read quorum

As in the case of updates there is an r query-string parameter that sets the quorum for reads. When a document is
read, requests are issued to all N copies of the partition hosting the document and the client receives a response
when r matching success responses are received. The default quorum is the simple majority of N, which is the
recommended choice for most applications.
Retrieving revision and size of a document
Method: HEAD /db/doc
Request: None
Response: None
Roles permitted: _reader
Returns the HTTP Headers containing a minimal amount of information about the specified document. The HEAD
method supports the same query arguments and returns the same status codes as the GET method, but only the
header information (including document size, and the revision as an ETag), is returned. For example, a simple
HEAD request:
HEAD /recipes/FishStew
Content-Type: application/json

Returns the following HTTP Headers:


HTTP/1.1 200 OK
Server: CouchDB/1.0.1 (Erlang OTP/R13B)
Etag: "7-a19a1a5ecd946dad70e85233ba039ab2"
Date: Fri, 05 Nov 2010 14:54:43 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 136
Cache-Control: must-revalidate

The Etag header shows the current revision for the requested document, and the Content-Length specifies
the length of the data, if the document were requested in full.
Adding any of the query arguments (as supported by GET_ method), then the resulting
HTTP Headers will correspond to what would be returned. Note that the
current revision is not returned when the refs_info argument is used. For example:

64

Chapter 2. API Reference

Documentation, Release 1.0.2

HTTP/1.1 200 OK
Server: CouchDB/1.0.1 (Erlang OTP/R13B)
Date: Fri, 05 Nov 2010 14:57:16 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 609
Cache-Control: must-revalidate

Creating or updating a document


Method: PUT /db/doc
Request: JSON of the new document, or updated version of the existing document
Response: JSON of the document ID and revision
Roles permitted: _writer
Query Arguments

Argument
batch

Description
Allow document store request to be batched with others

Optional
yes

Type
string

Supported Values
ok: Enable batching

HTTP Headers

Header
If-Match

Description
Current revision of the document for validation

Optional
yes

Return Codes

Code
201
202

Description
Document has been created successfully
Document accepted for writing (batch mode)

The PUT method creates a new named document, or creates a new revision of the existing document. Unlike the
POST method, you must specify the document ID in the request URL.
For example, to create the document DocID, you would send the following request:
PUT /db/DocID HTTP/1.1
Accept: application/json
{
"name": "Hannah",
"age": 120,
"gender": "female",
"_id": "DocID",
"_attachments": {
}
}

The return type is JSON of the status, document ID,and revision number:
{
"ok": true,
"id": "DocID",
"rev": "1-764b9b11845fd0b73cfa0e61acc74ecf"
}

2.5. Documents

65

Documentation, Release 1.0.2

Updating an Existing Document

To update an existing document you must specify the current revision number within the rev parameter. For
example:
PUT /db/DocID?rev=1-764b9b11845fd0b73cfa0e61acc74ecf HTTP/1.1
Accept: application/json
{
"name": "Hannah",
"age": 40,
"gender": "female",
"_id": "DocID",
"_attachments": {
},
"_rev": "1-764b9b11845fd0b73cfa0e61acc74ecf"
}

Alternatively, you can supply the current revision number in the If-Match HTTP header of the request. For
example:
PUT /test/DocID
If-Match: 1-61029d20ba39869b1fc879227f5d9f2b
Content-Type: application/json
{
"name": "Hannah",
"age": 40,
"gender": "female",
"_id": "DocID",
"_attachments": {
},
"_rev": "1-764b9b11845fd0b73cfa0e61acc74ecf"
}

The JSON returned will include the updated revision number:


{
"ok": true,
"id": "DocID",
"rev": "2-a537656346d6aa02353e1d31f07b16c4"
}

Overriding the default write quorum

The w query-string parameter on updates overrides the default write quorum for the database. When the N copies
of each document are written, the client will receive a response after w of them have been committed successfully
(the operations to commit the remaining copies will continue in the background). w defaults to the simple majority
of N, which is the recommended choice for most applications.
See also

For information on batched writes, which can provide improved performance, see Batch Mode Writes.
Deleting a document
Method: DELETE /db/doc
66

Chapter 2. API Reference

Documentation, Release 1.0.2

Request: None
Response: JSON of the deleted revision
Roles permitted: _writer
Query Arguments

Argument
rev

Description
Current revision of the document for validation

Optional
yes

Description
Current revision of the document for validation

Optional
yes

Type
string

HTTP Headers

Header
If-Match
Return Codes

Code
409

Description
Revision is missing, invalid or not the latest

Deletes the specified document from the database. You must supply (one of) the current revision(s), either by
using the rev parameter...
DELETE /test/DocID?rev=3-a1a9b39ee3cc39181b796a69cb48521c

... or with ETags using the If-Match header:


DELETE /test/DocID
If-Match: 3-a1a9b39ee3cc39181b796a69cb48521c

The returned JSON contains the document ID, revision and status:
{
"id" : "DocID",
"ok" : true,
"rev" : "4-2719fd41187c60762ff584761b714cfb"
}

Note: Note that deletion of a record increments the revision number. The use of a revision for deletion of the
record allows replication of the database to correctly track the deletion in synchronized copies.

Copying a document
Method: COPY /db/doc
Request: None
Response: JSON of the new document and revision
Roles permitted: _writer
Query Arguments

Argument
rev

Description
Revision to copy from

2.5. Documents

Optional
yes

Type
string

67

Documentation, Release 1.0.2

HTTP Headers

Header
Destination

Description
Destination document (and optional revision)

Optional
no

Return Codes

Code
201
409

Description
Document has been copied and created successfully
Revision is missing, invalid or not the latest

The COPY command (which is non-standard HTTP) copies an existing document to a new or existing document.
The source document is specified on the request line, with the Destination HTTP Header of the request
specifying the target document.
Copying a Document to a new document

You can copy the latest version of a document to a new document by specifying the current document and target
document:
COPY /test/DocID
Content-Type: application/json
Destination: NewDocId

The above request copies the document DocID to the new document NewDocId. The response is the ID and
revision of the new document.
{
"id" : "NewDocId",
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}

Copying from a Specific Revision

To copy from a specific version, add the rev argument to the query string:
COPY /test/DocID?rev=5-acfd32d233f07cea4b4f37daaacc0082
Content-Type: application/json
Destination: NewDocID

The new document will be created using the information in the specified revision of the source document.
Copying to an Existing Document

To copy to an existing document, you must specify the current revision string for the target document, adding the
rev parameter to the Destination HTTP Header string. For example:
COPY /test/DocID
Content-Type: application/json
Destination: ExistingDocID?rev=1-9c65296036141e575d32ba9c034dd3ee

The return value will be the new revision of the copied document:
{
"id" : "ExistingDocID",
"rev" : "2-55b6a1b251902a2c249b667dab1c6692"
}

68

Chapter 2. API Reference

Documentation, Release 1.0.2

2.5.2 Attachments
Retrieving an attachment
Method: GET /db/doc/attachment
Request: None
Response: Returns the attachment data
Roles permitted: _reader
Returns the file attachment attachment associated with the document doc. The raw data of the associated
attachment is returned (just as if you were accessing a static file. The returned HTTP Content-type will be
the same as the content type set when the document attachment was submitted into the database.
HTTP Range Requests

HTTP allows you to specify byte ranges for requests. This allows the implementation of resumable downloads
and skippable audio and video streams alike. This is available for all attachments inside Cloudant. To request a
range of bytes from an attachments, submit a Range header with your request:
GET /db/doc/attachment HTTP/1.1
Host: username.cloudant.com
Range: bytes=0-12

The response will return a status code 206 and specify the number of bytes sent in the Content-Length header
as well as the range in the Content-Range header.
206 Partial Content
Content-Type: application/octet-stream
Content-Range: bytes 0-12/30
Content-Length: 13
Accept-Ranges: bytes

HTTP supports many ways to specify single and even multiple byte ranges. Read all about it in RFC 2616.
Creating or updating an attachment
Method: PUT /db/doc/attachment
Request: Raw document data
Response: JSON document status
Roles permitted: _writer
Query Arguments

Argument
rev

Description
Current document revision

Optional
no

Type
string

HTTP Headers

Header
Content-Length
Content-Type
If-Match

2.5. Documents

Description
Length (bytes) of the attachment being uploaded
MIME type for the uploaded attachment
Current revision of the document for validation

Optional
no
no
yes

69

Documentation, Release 1.0.2

Return Codes

Code
201

Description
Attachment has been accepted

Upload the supplied content as an attachment to the specified document (doc). The attachment name provided
must be a URL encoded string. You must also supply either the rev query argument or the If-Match HTTP
header for validation, and the HTTP headers (to set the attachment content type). The content type is used when
the attachment is requested as the corresponding content-type in the returned document header.
For example, you could upload a simple text document using the following request:
PUT /recipes/FishStew/basic?rev=8-a94cb7e50ded1e06f943be5bfbddf8ca
Content-Length: 10
Content-Type: text/plain
Roast it

Or by using the If-Match HTTP header:


PUT /recipes/FishStew/basic
If-Match: 8-a94cb7e50ded1e06f943be5bfbddf8ca
Content-Length: 10
Content-Type: text/plain
Roast it

The returned JSON contains the new document information:


{
"id" : "FishStew",
"ok" : true,
"rev" : "9-247bb19a41bfd9bfdaf5ee6e2e05be74"
}

Note: Uploading an attachment updates the corresponding document revision. Revisions are tracked for the
parent document, not individual attachments.

Updating an Existing Attachment

Uploading an attachment using an existing attachment name will update the corresponding stored content of the
database. Since you must supply the revision information to add an attachment to a document, this serves as
validation to update the existing attachment.
Creating a document with an inline attachment
Inline attachments are just like any other attachment, except that their data is included in the document itself via
Base 64 encoding when the document is created or updated.
{
"_id":"attachment_doc",
"_attachments": {
"foo.txt": {
"content_type":"text/plain",
"data": "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
}
}
}

70

Chapter 2. API Reference

Documentation, Release 1.0.2

Deleting an attachment
Method: DELETE /db/doc/attachment
Request: None
Response: JSON status
Roles permitted: _writer
Query Arguments

Argument
rev

Description
Current document revision

Optional
no

Type
string

HTTP Headers

Header
If-Match

Description
Current revision of the document for validation

Optional
yes

Return Codes

Code
200
409

Description
Attachment deleted successfully
Supplied revision is incorrect or missing

Deletes the attachment attachment to the specified doc. You must supply the rev argument with the current
revision to delete the attachment.
For example to delete the attachment basic from the recipe FishStew:
DELETE /db/DocID/my+attachment?rev=2-f29c836d0bedc4b4b95cfaa6d99e95df HTTP/1.1
Accept: application/json

The returned JSON contains the updated revision information:


{
"ok": true,
"id": "DocID",
"rev": "3-aedfb06537c1d77a087eb295571f7fc9"
}

2.6 Design Documents


Design documents provide the main interface for building an application with Cloudant. The design document
defines the views and indexers used to extract information from the database. Design documents are created in
the same way as you create other database documents, but the content and definition of the documents is different.
Design documents are named using an ID defined with the design document URL path, and this URL can then be
used to access the database contents.
Views and lists operate together to provide automated (and formatted) output from your database. Indexers are
used with Cloudants Lucene-based search functions.

2.6. Design Documents

71

Documentation, Release 1.0.2

2.6.1 Retrieving a design document


Since design documents are just ordinary documents, there is nothing special about retrieving them. The URL path
used for design documents is /db/_design/design-doc, where design-doc is the name of the design
document and db is the name of the database.
See Retrieving a document for information about retrieving documents.

2.6.2 Creating or updating a design document


Method: PUT /db/_design/design-doc
Request: JSON of the design document
Response: JSON status
Roles permitted: _writer
Upload the specified design document, design-doc, to the specified database. Design documents are ordinary
documents defining views and indexers in the format summarised in the following table.
_id: Design Document ID
_rev: Design Document Revision
views (optional): View
viewname (one for each view): View Definition
* map: Map Function for the view
* reduce (optional): Reduce Function for the view
* dbcopy (optional): Database name to store view results in
indexes (optional): Indexes
index name (one for each index): Index definition
* analyzer: Object describing the analyzer to be used or an object with the following fields:
name: Name of the analyzer. Valid values are standard, email, keyword, simple,
whitespace, classic, perfield.
stopwords (optional): An array of stop words. Stop words are words that should not be
indexed. If this array is specified, it overrides the default list of stop words. The default list
of stop words depends on the analyzer. The list of stop words for the standard analyzer is:
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no,
not, of, on, or, such, that, the, their, then, there, these, they, this,
to, was, will, with.
default (for the per field analyzer): default language to use if there is no language specified
for the field
fields (for the per field analyzer): An object specifying which language to use to analyze
each field of the index. Field names in the object correspond to field names in the index (i.e.
the first parameter of the index function). The values of the fields are the languages to be
used, e.g. english.
* index: Function that handles the indexing
shows (optional): Show functions
function name (one for each function): Function definition
lists (optional): List functions
function name (one for each function): Function definition

72

Chapter 2. API Reference

Documentation, Release 1.0.2

General notes on functions in design documents


Functions in design documents are run on multiple nodes for each document and might be run several times. To
avoid inconsistencies, they need to be idempotent, meaning they need to behave identically when run multiple
times and/or on different nodes. In particular, you should avoid using functions that generate random numbers or
return the current time.
Map functions
The function contained in the map field is a Javascript function that is called for each document in the database.
The map function takes the document as an argument and optionally calls the emit function one or more times
to emit pairs of keys and values. The simplest example of a map function is this:
function(doc) {
emit(doc._id, doc);
}

The result will be that the view contains every document with the key being the id of the document, effectively
creating a copy of the database.
If the object passed to emit has an _id field, a view query with include_docs set to true will contain the
document with the given ID.
Reduce functions
If a view has a reduce function, it is used to produce aggregate results for that view. A reduce function is passed
a set of intermediate values and combines them to a single value. Reduce functions must accept, as input, results
emitted by its corresponding map function as well as results returned by the reduce function itself. The latter
case is referred to as a rereduce.
Here is an example of a reduce function:
function (key, values, rereduce) {
return sum(values);
}

Reduce functions are passed three arguments in the order key, values, and rereduce.
Reduce functions must handle two cases:
1. When rereduce is false:
key will be an array whose elements are arrays of the form [key,id], where key is a key emitted by the
map function and id is that of the document from which the key was generated.
values will be an array of the values emitted for the respective elements in keys
i.e.
reduce([ [key1,id1], [key2,id2], [key3,id3] ],
[value1,value2,value3], false)
2. When rereduce is true:
key will be null.
values will be an array of values returned by previous calls to the reduce function.
i.e. reduce(null, [intermediate1,intermediate2,intermediate3], true)
Reduce functions should return a single value, suitable for both the value field of the final view and as a member
of the values array passed to the reduce function.
Often, reduce functions can be written to handle rereduce calls without any extra code, like the summation function
above. In that case, the rereduce argument can be ignored.

2.6. Design Documents

73

Documentation, Release 1.0.2

Built-in reduce functions

For performance reasons, a few simple reduce functions are built in. To use one of the built-in functions, put its
name into the reduce field of the view object in your design document.
Function
_sum
_count
_stats

Description
Produces the sum of all values for a key, values must be numeric
Produces the row count for a given key, values can be any valid json
Produces a json structure containing sum, count, min, max and sum squared, values must be numeric

Dbcopy
If the dbcopy field of a view is set, the view contents will be written to a database of that name. If dbcopy is
set, the view must also have a reduce function. For every key/value pair created by a reduce query with group
set to true, a document will be created in the dbcopy database. If the database does not exist, it will be created.
The documents created have the following fields:
Field
key
value
_id
salt
partials

Description
The key of the view result. This can be a string or an array.
The value calculated by the reduce function.
The ID is a hash of the key.
This value is an implementation detail used internally.
This value is an implementation detail used internally.

For more information on writing views, see Querying a view.


Index functions
The function contained in the index field is a Javascript function that is called for each document in the database. It
takes the document as a parameter, extracts some data from it and then calls the index function to index that data.
The index function take 3 parameters, where the third parameter is optional. The first parameter is the name of
the index. If the special value "default" is used, the data is stored in the default index, which is queried if no
index name is specified in the search. The second parameter is the data to be indexed. The third parameter is an
object that can contain the fields store and index. If the store field contains the value yes, the value will
be returned in search results, otherwise, it will only be indexed. The index field can have the following values
describing whether and how the data is indexed:
analyzed: Index the tokens produced by running the fields value through an analyzer.
analyzed_no_norms: Index the tokens produced by running the fields value through an analyzer, and
also separately disable the storing of norms.
no: Do not index the field value.
not_analyzed: Index the fields value without using an analyzer. This is necessary if the field will be
used for sorting.
not_analyzed_no_norms: Index the fields value without an analyzer, and also disable the indexing
of norms.
Here is an example of a simple index function.
function(doc) {
if (doc.foo) {
index("default", doc.foo);
}
}

This index function indexes only a single field in the document. You, however, compute the value to be indexed
from several fields or index only part of a field (rather than its entire value).

74

Chapter 2. API Reference

Documentation, Release 1.0.2

The index function also provides a third, options parameter that receives a JavaScript Object with the following
possible values and defaults:
Op- Description
tion
boostAnaolgous to the boost query string
parameter, but done at index time rather than
query time.

Values

Default

Float

indexWhether (and how) the data is indexed. The


options available are explained in the Lucene
documentation.
storeIf yes, the value will be returned in the search
result; if no, the value will not be returned in
the search result.

analyzed, analyzed_no_norms,
no, not_analyzed,
not_analyzed_no_norms
true, false

1.0
(no
boosting)
analyzed

false

For more information on indexing and searching, see Searching for documents using Lucene queries.
Show functions
Show function can be used to render a document in a different format or extract only some information from a
larger document. Some show functions dont deal with documents at all and just return information about the user
making the request or other request parameters. Show functions take two arguments: The document identified by
the doc-id part of the URL (if specified) and an object describing the HTTP request. The return value of a show
function is either a string containing any data to be returned in the HTTP response or a Javascript object with fields
for the headers and the body of the HTTP response.
Example of a simple show function
function(doc, req) {
return <person name=" + doc.name + " birthday=" + doc.birthday
}

+ " />;

The request object

The request object passed to the show function describes the http request and has the following fields:
info: An object containing information about the database.
id: ID of the object being shown or null if there is no object.
method: The HTTP method used, e.g. GET.
path: An array of strings describing the path of the request URL.
query: An object that contains a field for each query parameter.
headers: An object that contains a field for each header of the HTTP request.
peer: The IP address making the request.
cookie: An object that contains a field for each cookie submitted with the HTTP request.
body: The body of the HTTP request.
form: An object containing a field for each form field of the request, if the request has the
x-www-form-urlencoded content type.
userCtx: An object describing the identity and permissions of the user making the request.
db: database name
name: user name
2.6. Design Documents

75

Documentation, Release 1.0.2

roles: An array of strings for each role the user has, e.g. ["_admin", "_reader", "_writer"]
Here is an example for a request object:
{

"info": {
"update_seq": "31-g1AAAADneJzLYWBgYMlgTmFQSElKzi9KdUhJMtbLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRI
"db_name": "dbname",
"purge_seq": 0,
"other": {
"data_size": 209
},
"doc_del_count": 0,
"doc_count": 2,
"disk_size": 1368408,
"disk_format_version": 5,
"compact_running": false,
"instance_start_time": "0"
},
"uuid": "d2b979d10234eaedc505a090968a4e7e",
"id": "74b2be56045bed0c8c9d24b939000dbe",
"method": "GET",
"path": [
"dbname",
"_design",
"designdocname",
"_show",
"showfunctionname",
"74b2be56045bed0c8c9d24b939000dbe"
],
"query": {
"foo": "bar"
},
"headers": {
"Accept": "text\/html,application\/xhtml+xml,application\/xml;q=0.9,*\/*;q=0.8",
"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.3",
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "en-US,en;q=0.8,de-DE;q=0.6,de;q=0.4",
"Connection": "close",
"Host": "username.cloudant.com",
"User-Agent": "Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.22 (KHTML, like Gecko) Ubuntu
"X-Forwarded-For": "109.69.82.183"
},
"body": "undefined",
"peer": "109.69.82.183",
"form": {
},
"cookie": {
"foo": "bar"
},
"userCtx": {
"db": "dbname",
"name": "username",
"roles": [
"_admin",
"_reader",
"_writer"
]
}
}

76

Chapter 2. API Reference

Documentation, Release 1.0.2

Return values

Show functions can either return a string or an object with the headers and body of the HTTP response. The object
returned should have the following fields:
body: A String containing the body of the HTTP response
headers: An object with fields for each HTTP header of the response
Example show function returning a response object
function(doc, req) {
return {
body: (<h1> + req.query.header + </h1> +
<ul><li> + doc.first + </li> +
<li> + doc.second + </li></ul>),
headers: { Content-Type: text/html }
};
}

List functions
List functions are a lot like show functions, but instead of taking just one object as their input, they are applied to
all data returned from a view. Like the name suggests, they can be used to create lists of objects in various formats
(xml, html, csv).
List functions take two parameters: The first one is usually called head and contains information about the
number of rows returned from the view. The second parameter is identical to the request parameter described
under The request object.
Head parameter

The first parameter to a list function contains the following fields:


total_rows: Total number of rows
offset: Number of rows skipped due to skip query parameter.
Available functions

There are three functions available for use in list functions.


The start function is used to set the HTTP status code and the header information to be sent in the HTTP
response. It can be used as follows:
start({code: 200, headers: {"content-type": "text/html"}});

The send function is used to send content in the body of the response.
send(hello);
send(bye);

The get_row function returns the next row from the view data or null if there are no more rows. The object
returned has the following fields:
id: The ID of the document associated with this row.
key: The key emitted by the view.
value: The data emitted by the view.

2.6. Design Documents

77

Documentation, Release 1.0.2

This is an example of a row object returned by get_row.


{
"id": "de698c77-b38f-44af-9d89-455de7310b58",
"key": 0,
"value": {
"_id": "de698c77-b38f-44af-9d89-455de7310b58",
"_rev": "1-ad1680946839206b088da5d9ac01e4ef",
"foo": 0,
"bar": "foo"
}
}

Example list function

This example function created an unordered HTML list from the foo fields of the view values.
function(head, req, third, fourth, fifth) {
start({code: 200, headers: {"Content-Type": "text/html"}});
var row;
send("<ul>");
while (row = getRow()) {
send("<li>" + row.value.foo + "</li>");
}
send("</ul>");
}

Rewrite rules
A design document can contain rules for URL rewriting as an array in the rewrites field. Requests that match
the rewrite rules must have a URL path that starts with /db/_design/doc/_rewrite.
"rewrites": [
{
"from": "/",
"to": "index.html",
"method": "GET",
"query": {}
},{
"from": "/foo/:var",
"to": "/foo",
"method": "GET",
"query": {"v": "var"}
}
]

Each rule is a JSON object with 4 fields.


Field
from

Description
A path relative to /db/_design/doc/_rewrite used to match URLs to rewrite rules. Path
elements that start with a : are treated as variables and match any string that does not contain a /. A
* can only appear at the end of the string and matches any string - including slashes.
to
The path (relative to /db/_design/doc/ and not including the query part of the URL) that will
be the result of the rewriting step. Variables captured in from can be used in to. * can also be used
and will contain everything captured by the pattern in from.
methodThe HTTP method that should be matched on.
query The query part of the resulting URL. This is a JSON object containing the key/value pairs of the
query.

78

Chapter 2. API Reference

Documentation, Release 1.0.2

Examples

Rule

Url

{"from": "/a/b", "to":


"/some/"}
{"from": "/a/b", "to":
"/some/:var"}
{"from": "/a", "to":
"/some/*"}
{"from": "/a/*", "to":
"/some/*}
{"from": "/a", "to":
"/some/*"}
{"from": "/a/:foo/*","to":
"/some/:foo/*"}
{"from": "/a/:foo", "to":
"/some", "query": { "k":
":foo" }}
{"from": "/a", "to":
"/some/:foo" }

Rewrite to

Tokens
/db/_design/doc/_rewrite/a/b?k=v
/db/_design/doc/some/k=v
k=
v
/db/_design/doc/_rewrite/a/b
/db/_design/doc/some/b?var=b
var
=b
/db/_design/doc/_rewrite/a
/db/_design/doc/some
/db/_design/doc/_rewrite/a/b/c
/db/_design/doc/some/b/c
/db/_design/doc/_rewrite/a
/db/_design/doc/some
/db/_design/doc/_rewrite/a/b/c
/db/_design/doc/some/b/c?foo=b
foo
=b
/db/_design/doc/_rewrite/a/b
/db/_design/doc/some/?k=b&foo=b
foo
=:=
b
/db/_design/doc/_rewrite/a?foo=b
/db/_design/doc/some/b&foo=b
foo
=b

2.6.3 Deleting a design document


Method: DELETE /db/_design/design-doc
Request: None
Response: JSON of deleted design document
Roles permitted: _writer
Query Arguments
Argument
rev

Description
Current revision of the document for validation

Optional
yes

Type
string

HTTP Headers
Header
If-Match

Description
Current revision of the document for validation

Optional
yes

Delete an existing design document. Deleting a design document also deletes all of the associated view indexes,
and recovers the corresponding space on disk for the indexes in question.
To delete, you must specify the current revision of the design document using the rev query argument.
For example:
DELETE /recipes/_design/recipes?rev=2-ac58d589b37d01c00f45a4418c5a15a8
Content-Type: application/json

The response contains the delete document ID and revision:


{
"id" : "recipe/_design/recipes"
"ok" : true,
"rev" : "3-7a05370bff53186cb5d403f861aca154",
}

2.6. Design Documents

79

Documentation, Release 1.0.2

2.6.4 Copying a design document


Method: COPY /db/_design/design-doc
Request: None
Response: JSON of the new document and revision
Roles permitted: _writer
Query Arguments:
Argument: rev
* Description: Revision to copy from
* Optional: yes
* Type: string
HTTP Headers
Header: Destination
* Description: Destination document (and optional revision)
* Optional: no
The COPY command (non-standard HTTP) copies an existing design document to a new or existing document.
The source design document is specified on the request line, with the Destination HTTP Header of the request
specifying the target document.
Copying a Design Document
To copy the latest version of a design document to a new document you specify the base document and target
document:
COPY /recipes/_design/recipes
Content-Type: application/json
Destination: /recipes/_design/recipelist

The above request copies the design document recipes to the new design document recipelist. The
response is the ID and revision of the new document.
{
"id" : "recipes/_design/recipelist"
"rev" : "1-9c65296036141e575d32ba9c034dd3ee",
}

Note: Copying a design document does not automatically reconstruct the view indexes. These will be recreated,
as with other views, the first time the new view is accessed.

Copying from a Specific Revision


To copy from a specific version, add the rev argument to the query string:
COPY /recipes/_design/recipes?rev=1-e23b9e942c19e9fb10ff1fde2e50e0f5
Content-Type: application/json
Destination: recipes/_design/recipelist

The new design document will be created using the specified revision of the source document.

80

Chapter 2. API Reference

Documentation, Release 1.0.2

Copying to an Existing Design Document


To copy to an existing document, you must specify the current revision string for the target document, using the
rev parameter to the Destination HTTP Header string. For example:
COPY /recipes/_design/recipes
Content-Type: application/json
Destination: recipes/_design/recipelist?rev=1-9c65296036141e575d32ba9c034dd3ee

The return value will be the new revision of the copied document:
{
"id" : "recipes/_design/recipes"
"rev" : "2-55b6a1b251902a2c249b667dab1c6692",
}

2.6.5 Retrieving information about a design document


Method: GET /db/_design/design-doc/_info
Request: None
Response: JSON of the design document information
Roles permitted: _reader
Obtains information about a given design document, including the index, index size and current status of the
design document and associated index information.
For example, to get the information for the recipes design document:
GET /recipes/_design/recipes/_info
Content-Type: application/json

This returns the following JSON structure:


{
"name" : "recipes"
"view_index" : {
"compact_running" : false,
"updater_running" : false,
"language" : "javascript",
"purge_seq" : 10,
"waiting_commit" : false,
"waiting_clients" : 0,
"signature" : "fc65594ee76087a3b8c726caf5b40687",
"update_seq" : 375031,
"disk_size" : 16491
},
}

The individual fields in the returned JSON structure are detailed below:
name: Name/ID of Design Document
view_index: View Index
compact_running: Indicates whether a compaction routine is currently running on the view
disk_size: Size in bytes of the view as stored on disk
language: Language for the defined views
purge_seq: The purge sequence that has been processed
signature: MD5 signature of the views for the design document

2.6. Design Documents

81

Documentation, Release 1.0.2

update_seq: The update sequence of the corresponding database that has been indexed
updater_running: Indicates if the view is currently being updated
waiting_clients: Number of clients waiting on views from this design document
waiting_commit: Indicates if there are outstanding commits to the underlying database that need to
processed

2.6.6 Querying a view


Method: GET /db/_design/design-doc/_view/view-name
Request: None
Response: JSON of the documents returned by the view
Roles permitted: _reader

82

Chapter 2. API Reference

Documentation, Release 1.0.2

Query Arguments
ArDecription
gument
descending
Return the documents in descending by key
order
endkeyStop returning records when the specified key
is reached

Op- Type
tional

De- Supported Values


fault

endkey_docid
Stop returning records when the specified
document ID is reached
group Group the results using the reduce function to a
group or single row
group_level
Only applicable if the view uses complex keys,
i.e. keys that are JSON arrays. Groups reduce
results for the specified number of array fields.
include_docs
Include the full content of the documents in the
response
inclusive_end
included rows with the specified endkey
key
Return only documents that match the
specified key. Note that keys are JSON values
and must be URL-encoded.
limit Limit the number of the returned documents to
the specified number
reduceUse the reduce function
skip Skip this number of rows from the start
stale Allow the results from a stale view to be used.
This makes the request return immediately,
even if the view has not been completely built
yet. If this parameter is not given, a response
will be returned only after the view has been
built.
startkey
Return records starting with the specified key

yes

startkey_docid
Return records starting with the specified
document ID
update_seq
Include the update sequence in the generated
results

yes

string
or
JSON
array
string

yes

boolean false

yes

boolean false

yes

yes

string
or
JSON
array
string

yes

boolean false

yes

numeric

yes

boolean false

yes
yes

boolean true
string

yes

numeric
boolean true
nu0
meric
string false ok: Allow stale views,
update_after: Allow
stale views, but update them
immediately after the request

yes
yes

yes

Executes the specified view-name from the specified design-doc design document.
Querying Views and Indexes
The definition of a view within a design document also creates an index based on the key information defined
within each view. The production and use of the index significantly increases the speed of access and searching or
selecting documents from the view.
However, the index is not updated when new documents are added or modified in the database. Instead, the index
is generated or updated, either when the view is first accessed, or when the view is accessed after a document has
been updated. In each case, the index is updated before the view query is executed against the database.
View indexes are updated incrementally in the following situations:
A new document has been added to the database.

2.6. Design Documents

83

Documentation, Release 1.0.2

A document has been deleted from the database.


A document in the database has been updated.
View indexes are rebuilt entirely when the view definition changes. To achieve this, a fingerprint of the view
definition is created when the design document is updated. If the fingerprint changes, then the view indexes are
entirely rebuilt. This ensures that changes to the view definitions are reflected in the view indexes.
Note: View index rebuilds occur when one view from the same view group (i.e. all the views defined within a
single design document) needs to be rebuilt. For example, if you have a design document with three views, and
you update the document, all three view indexes within the design document will be rebuilt.
Because the view is updated when it has been queried, it can result in a delay in returned information when the
view is accessed, especially if there are a large number of documents in the database and the view index does not
exist. There are a number of ways to mitigate, but not completely eliminate, these issues. These include:
Create the view definition (and associated design documents) on your database before allowing insertion
or updates to the documents. If this is allowed while the view is being accessed, the index can be updated
incrementally.
Manually force a view request from the database. You can do this either before users are allowed to use the
view, or you can access the view manually after documents are added or updated.
Use /db/_changes to monitor for changes to the database and then access the view to force the corresponding view index to be updated. See Obtaining a list of changes for more information.
None of these can completely eliminate the need for the indexes to be rebuilt or updated when the view is accessed,
but they may lessen the effects on end-users of the index update affecting the user experience.
Another alternative is to allow users to access a stale version of the view index, rather than forcing the index to
be updated and displaying the updated results. Using a stale view may not return the latest information, but will
return the results of the view query using an existing version of the index.
For example, to access the existing stale view by_recipe in the recipes design document:
/recipes/_design/recipes/_view/by_recipe?stale=ok

Accessing a stale view:


Does not trigger a rebuild of the view indexes, even if there have been changes since the last access.
Returns the current version of the view index, if a current version exists.
Returns an empty result set if the given view index does exist.
As an alternative, you use the update_after value to the stale parameter. This causes the view to be returned
as a stale view, but for the update process to be triggered after the view information has been returned to the client.
In addition to using stale views, you can also make use of the update_seq query argument. Using this query
argument generates the view information including the update sequence of the database from which the view was
generated. The returned value can be compared to the current update sequence exposed in the database information
(returned by Retrieving information about a database).
Sorting Returned Rows
Each element within the returned array is sorted using native UTF-8 sorting according to the contents of the key
portion of the emitted content. The basic order of output is as follows:
null
false
true
Numbers
Text (case sensitive, lowercase first)
84

Chapter 2. API Reference

Documentation, Release 1.0.2

Arrays (according to the values of each element, in order)


Objects (according to the values of keys, in key order)
You can reverse the order of the returned view information by using the descending query value set to true.
For example, Retrieving the list of recipes using the by_title (limited to 5 records) view:
{
"offset" : 0,
"rows" : [
{
"id" : "3-tiersalmonspinachandavocadoterrine",
"key" : "3-tier salmon, spinach and avocado terrine",
"value" : [
null,
"3-tier salmon, spinach and avocado terrine"
]
},
{
"id" : "Aberffrawcake",
"key" : "Aberffraw cake",
"value" : [
null,
"Aberffraw cake"
]
},
{
"id" : "Adukiandorangecasserole-microwave",
"key" : "Aduki and orange casserole - microwave",
"value" : [
null,
"Aduki and orange casserole - microwave"
]
},
{
"id" : "Aioli-garlicmayonnaise",
"key" : "Aioli - garlic mayonnaise",
"value" : [
null,
"Aioli - garlic mayonnaise"
]
},
{
"id" : "Alabamapeanutchicken",
"key" : "Alabama peanut chicken",
"value" : [
null,
"Alabama peanut chicken"
]
}
],
"total_rows" : 2667
}

Requesting the same in descending order will reverse the entire view content. For example the request
GET /recipes/_design/recipes/_view/by_title?limit=5&descending=true
Accept: application/json
Content-Type: application/json

Returns the last 5 records from the view:


{
"offset" : 0,
"rows" : [

2.6. Design Documents

85

Documentation, Release 1.0.2

{
"id" : "Zucchiniinagrodolcesweet-sourcourgettes",
"key" : "Zucchini in agrodolce (sweet-sour courgettes)",
"value" : [
null,
"Zucchini in agrodolce (sweet-sour courgettes)"
]
},
{
"id" : "Zingylemontart",
"key" : "Zingy lemon tart",
"value" : [
null,
"Zingy lemon tart"
]
},
{
"id" : "Zestyseafoodavocado",
"key" : "Zesty seafood avocado",
"value" : [
null,
"Zesty seafood avocado"
]
},
{
"id" : "Zabaglione",
"key" : "Zabaglione",
"value" : [
null,
"Zabaglione"
]
},
{
"id" : "Yogurtraita",
"key" : "Yogurt raita",
"value" : [
null,
"Yogurt raita"
]
}
],
"total_rows" : 2667
}

The sorting direction is applied before the filtering is applied using the startkey and endkey query arguments.
For example the following query:
GET /recipes/_design/recipes/_view/by_ingredient?startkey=%22carrots%22&endkey=%22egg%22
Accept: application/json
Content-Type: application/json

Will operate correctly when listing all the matching entries between carrots and egg. If the order of output is
reversed with the descending query argument, the view request will return no entries:

GET /recipes/_design/recipes/_view/by_ingredient?descending=true&startkey=%22carrots%22&endkey=%22
Accept: application/json
Content-Type: application/json

The returned result is empty:


{
"total_rows" : 26453,
"rows" : [],

86

Chapter 2. API Reference

Documentation, Release 1.0.2

"offset" : 21882
}

The results will be empty because the entries in the view are reversed before the key filter is applied, and therefore
the endkey of egg will be seen before the startkey of carrots, resulting in an empty list.
Instead, you should reverse the values supplied to the startkey and endkey parameters to match the descending sorting applied to the keys. Changing the previous example to:

GET /recipes/_design/recipes/_view/by_ingredient?descending=true&startkey=%22egg%22&endkey=%22carr
Accept: application/json
Content-Type: application/json

Specifying Start and End Values


The startkey and endkey query arguments can be used to specify the range of values to be displayed when
querying the view.

2.6.7 Querying a view using a list of keys


Method: POST /db/_design/design-doc/_view/view-name
Request: List of keys to be returned from specified view
Response: JSON of the documents returned by the view
Roles permitted: _reader

2.6. Design Documents

87

Documentation, Release 1.0.2

Query Arguments
Argu- Decription
ment
descending
Return the documents in descending by
key order
endkey Stop returning records when the specified
key is reached

Op- Type
De- Supported Values
tional
fault
yes boolean false

endkey_docid
Stop returning records when the specified
document ID is reached
group Group the results using the reduce function
to a group or single row
group_level
Only applicable if the view uses complex
keys, i.e. keys that are JSON arrays.
Groups reduce results for the specified
number of array fields.
include_docs
Include the full content of the documents
in the response
inclusive_end
included rows with the specified endkey
key
Return only documents that match the
specified key. Note that keys are JSON
values and must be URL-encoded.
limit Limit the number of the returned
documents to the specified number
reduce Use the reduce function
skip Skip this number of rows from the start

yes

string
or
JSON
array
string

yes

boolean false

yes

numeric

yes

boolean false

yes
yes

boolean true
string

yes

stale Allow the results from a stale view to be


used. This makes the request return
immediately, even if the view has not been
completely built yet.
startkey
Return records starting with the specified
key

yes

startkey_docid
Return records starting with the specified
document ID
update_seq
Include the update sequence in the
generated results

yes

numeric
boolean true
nu0
meric
string false ok: Allow stale views,
update_after: Allow stale
views, but update them
immediately after the request
string
or
JSON
array
string

yes

boolean false

yes

yes
yes

yes

Executes the specified view-name from the specified design-doc design document. Unlike the GET method
for accessing views, the POST method supports the specification of explicit keys to be retrieved from the view
results. The remainder of the POST view functionality is identical to the Querying a view API.
For example, the request below will return all the recipes where the key for the view matches either claret or
clear apple cider :
POST /recipes/_design/recipes/_view/by_ingredient
Content-Type: application/json
{
"keys" : [
"claret",
"clear apple juice"
]
}

The returned view data contains the standard view information, but only where the keys match.

88

Chapter 2. API Reference

Documentation, Release 1.0.2

{
"total_rows" : 26484,
"rows" : [
{
"value" : [
"Scotch collops"
],
"id" : "Scotchcollops",
"key" : "claret"
},
{
"value" : [
"Stand pie"
],
"id" : "Standpie",
"key" : "clear apple juice"
}
],
"offset" : 6324
}

Multi-document Fetching
By combining the POST method to a given view with the include_docs=true query argument you can
obtain multiple documents from a database. The result is more efficient than using multiple Retrieving a document
requests.
For example, sending the following request for ingredients matching claret and clear apple juice:
POST /recipes/_design/recipes/_view/by_ingredient?include_docs=true
Content-Type: application/json
{
"keys" : [
"claret",
"clear apple juice"
]
}

Returns the full document for each recipe:


{
"offset" : 6324,
"rows" : [
{
"doc" : {
"_id" : "Scotchcollops",
"_rev" : "1-bcbdf724f8544c89697a1cbc4b9f0178",
"cooktime" : "8",
"ingredients" : [
{
"ingredient" : "onion",
"ingredtext" : "onion, peeled and chopped",
"meastext" : "1"
},
...
],
"keywords" : [
"cook method.hob, oven, grill@hob",
"diet@wheat-free",
"diet@peanut-free",
"special collections@classic recipe",

2.6. Design Documents

89

Documentation, Release 1.0.2

"cuisine@british traditional",
"diet@corn-free",
"diet@citrus-free",
"special collections@very easy",
"diet@shellfish-free",
"main ingredient@meat",
"occasion@christmas",
"meal type@main",
"diet@egg-free",
"diet@gluten-free"

],
"preptime" : "10",
"servings" : "4",
"subtitle" : "This recipe comes from an old recipe book of 1683 called The Gentlewoma
"title" : "Scotch collops",
"totaltime" : "18"
},
"id" : "Scotchcollops",
"key" : "claret",
"value" : [
"Scotch collops"
]
},
{

...

"doc" : {
"_id" : "Standpie",
"_rev" : "1-bff6edf3ca2474a243023f2dad432a5a",
"cooktime" : "92",
"ingredients" : [
],
"keywords" : [
"diet@dairy-free",
"diet@peanut-free",
"special collections@classic recipe",
"cuisine@british traditional",
"diet@corn-free",
"diet@citrus-free",
"occasion@buffet party",
"diet@shellfish-free",
"occasion@picnic",
"special collections@lunchbox",
"main ingredient@meat",
"convenience@serve with salad for complete meal",
"meal type@main",
"cook method.hob, oven, grill@hob / oven",
"diet@cow dairy-free"
],
"preptime" : "30",
"servings" : "6",
"subtitle" : "Serve this pie with pickled vegetables and potato salad.",
"title" : "Stand pie",
"totaltime" : "437"
},
"id" : "Standpie",
"key" : "clear apple juice",
"value" : [
"Stand pie"
]

}
],
"total_rows" : 26484
}

90

Chapter 2. API Reference

Documentation, Release 1.0.2

2.6.8 Sending several queries to a view


Method: POST /db/_design/design-doc/_view/view-name
Request: A JSON document containing an array of query objects
Response: A JSON document containing an array of response object - one per query
Roles permitted: _reader
This in an example of a request body:
{
"queries": [{
}, {
"startkey": 1,
"limit": 2
}]
}

The JSON object contains only the queries field, which holds an array of query objects. Each query object can
have fields for the parameters of a query. The field names and their meaning are the same as the query parameters
of a regular view request.
Here is an example of a response:
{
"results": [{
"total_rows": 3,
"offset": 0,
"rows": [{
"id": "8fbb1250-6908-42e0-8862-aef60dc430a2",
"key": 0,
"value": {
"_id": "8fbb1250-6908-42e0-8862-aef60dc430a2",
"_rev": "1-ad1680946839206b088da5d9ac01e4ef",
"foo": 0,
"bar": "foo"
}
}, {
"id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"key": 1,
"value": {
"_id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"_rev": "1-abb9a4fc9f0f339efbf667ace66ee6a0",
"foo": 1,
"bar": "bar"
}
}, {
"id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"key": 2,
"value": {
"_id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"_rev": "1-d075a71f2d47af7d4f64e4a367160e2a",
"foo": 2,
"bar": "baz"
}
}]
}, {
"total_rows": 3,
"offset": 1,
"rows": [{
"id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"key": 1,

2.6. Design Documents

91

Documentation, Release 1.0.2

"value": {
"_id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"_rev": "1-abb9a4fc9f0f339efbf667ace66ee6a0",
"foo": 1,
"bar": "bar"
}
}, {
"id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"key": 2,
"value": {
"_id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"_rev": "1-d075a71f2d47af7d4f64e4a367160e2a",
"foo": 2,
"bar": "baz"
}
}]
}]
}

The JSON object contains only the results field, which holds an array of result objects - one for each query.
Each result object contains the same fields as the response to a regular view request.

2.6.9 Querying show functions


Method: GET /db/_design/design-doc/_show/function-name/doc-id
Request: None
Response: Content returned by the show function
Roles permitted: _reader
Query Arguments: Any arguments will be passed to the list function. The second parameter of the list
function is a request object containing a query field. This field holds an object with a field for each query
parameter.
Executes the specified show-name from the specified design-doc design document with the document specified by doc-id passed to it. The /doc-id part of the URL is optional. The response is completely determined
by the show function.
Example request
GET /db/_design/my+shows/_show/asHtml/1559befe-b13f-494c-ab44-49842666ad09?h=heading HTTP/1.1
Accept: application/json

2.6.10 Querying list functions


Method: GET /db/_design/design-doc/_list/function-name/view-name
Request Body: None
Response Body: Content returned by the list function
Roles permitted: _reader
Query Arguments: Query arguments are the same as those for querying a view. Any other arguments will
be passed to the list function. The second parameter of the list function is a request object containing a
query field. This field holds an object with a field for each query parameter.
Executes the specified function from the specified design document with the data from the specified view passed
to it. The response is completely determined by the list function.

92

Chapter 2. API Reference

Documentation, Release 1.0.2

Example request
GET /db/_design/my+lists/_list/asHtml/myview HTTP/1.1
Accept: application/json

2.6.11 Searching for documents using Lucene queries


Method: GET /db/_design/design-doc/_search/search-name
Request Body: None
Response Body: Returns the result of the search
Roles permitted: _reader

2.6. Design Documents

93

Documentation, Release 1.0.2

Query Arguments
ArDescription
gument
queryA Lucene query.

bookmark
A bookmark that was received from a
previous search. This allows you to page
through the results. If there are no more
results after the bookmark, you will get a
response with an empty rows array and the
same bookmark. That way you can
determine that you have reached the end of
the result list.
staleAllow the results from a stale search index
to be used
limitLimit the number of the returned
documents to the specified number. In case
of a grouped search, this parameter limits
the number of documents per group.
include_docs
Include the full content of the documents
in the response
sort Specifies the sort order of the results. In a
grouped search (i.e. when group_field
is used), this specifies the sort order within
a group. The default sort order is
relevance.

group_field
Field by which to group search matches.

group_limit
Maximum group count. This field can only
be used if group_field is specified.
group_sort
This field defines the order of the groups in
a search using group_field. The
default sort order is relevance.

Op- Type Supported Values


tional
no

string
or
number
yes string

yes string
yes numeric

yes booleanfalse
yes JSON A JSON string of the form
"fieldname<type>" or
-fieldname<type> for descending
order, where fieldname is the name of a
string or number field and type is either
number or string or a JSON array of
such strings. The type part is optional and
defaults to number. Some examples are
"foo", "-foo", "bar<string>",
"-foo<number>" and
["-foo<number>",
"bar<string>"]. String fields used
for sorting must not be analyzed fields.
The field(s) used for sorting must be
indexed by the same indexer used for the
search query.
yes String A string containing the field name and
optionally the type of the field (string
or number) in angle brackets. If the type
is not specified, it defaults to string.
Examples are name<string>, which is
equivalent to name, and age<number>.
yes Numeric
yes JSON This field can have the same values as the
sort field, so single fields as well as
arrays of fields are supported.

This request searches for documents whose index fields match the Lucene query. Which fields of a document are
indexed and how is determined by the index functions in the design document. For more information, see Creating
or updating a design document
Here is an example of an HTTP request:

94

Chapter 2. API Reference

Documentation, Release 1.0.2

GET /db/_design/my+searches/_search/bar?q=a*&sort=["foo<number>"] HTTP/1.1


Accept: application/json

Search Response
The response is a JSON document that has the following structure.
total_rows: Number of results that match the search query. This number can be higher than the number
of objects in the rows array.
bookmark: String to be submitted in the next query to page through results. If this response contained no
results, the bookmark will be the same as the one used to obtain this response.
rows: Array of objects describing a search result for ungrouped (i.e. without group_field) searches.
id: Document ID
order: Specifies the order with regard to the indexed fields
fields: Object containing other search indexes
groups: Array of group objects describing each group of the search result. This field is only present for
grouped searches.
rows: Array of objects in this group that match the search. The objects in the array have the same
fields as the objects in the rows array for ungrouped searches.
total_rows: Number of objects that match the search. This number can be higher than the number
of objects in the rows array.
by: The value of the grouping field for this group.
Here is the response corresponding to the request above:
{

"total_rows": 3,
"bookmark": "g1AAAACWeJzLYWBgYMpgTmFQSElKzi9KdUhJMtbLTS3KLElMT9VLzskvTUnMK9HLSy3JAalMcgCSSfX____
"rows": [{
"id": "dd828eb4-c3f1-470f-aeff-c375ef70e4ad",
"order": [0.0, 1],
"fields": {
"default": "aa",
"foo": 0.0
}
}, {
"id": "ea522cf1-eb8e-4477-aa92-d1fa459bb216",
"order": [1.0, 0],
"fields": {
"default": "ab",
"foo": 1.0
}
}, {
"id": "c838baed-d573-43ea-9c34-621cf0f13301",
"order": [2.0, 0],
"fields": {
"default": "ac",
"foo": 2.0
}
}]
}

This example shows a response to a search query with group_field set.


{
"total_rows": 5,

2.6. Design Documents

95

Documentation, Release 1.0.2

"groups": [{
"by": "group0",
"total_rows": 3,
"rows": [{
"id": "3497ff56-6d8c-435a-bcf3-704ac92252ff",
"order": [1.0, 0],
"fields": {
"default": "ac",
"bar": "ac",
"foo": "group0"
}
}, {
"id": "47d6a6cc-4533-42a3-87b7-91850fbadac8",
"order": [1.0, 0],
"fields": {
"default": "aa",
"bar": "aa",
"foo": "group0"
}
}, {
"id": "5f6f54e0-e947-412b-97e3-5942958b509d",
"order": [1.0, 1],
"fields": {
"default": "ab",
"bar": "ab",
"foo": "group0"
}
}]
}, {
"by": "group1",
"total_rows": 2,
"rows": [{
"id": "9a7e5990-d396-46d0-b642-420ce7178902",
"order": [1.0, 0],
"fields": {
"default": "ae",
"bar": "ae",
"foo": "group1"
}
}, {
"id": "fc91e465-fda9-44aa-a539-1a15c639d468",
"order": [1.0, 1],
"fields": {
"default": "ad",
"bar": "ad",
"foo": "group1"
}
}]
}]
}

2.7 Miscellaneous
These endpoints provide information about the state of the cluster and let you start replication tasks.
A list of the available methods and URL paths is provided below:

96

Chapter 2. API Reference

Documentation, Release 1.0.2

Method
GET
GET
GET
GET
POST
GET

Path
/
/_active_tasks
/_membership
/_all_dbs
/_replicate
/_uuids

Description
Get the welcome message and version information
Obtain a list of the tasks running in the server
Obtain a list of nodes in the cluster.
Get a list of all the DBs
Set or cancel replication
Get generated UUIDs from the server

2.7.1 Retrieving information about the server


Method: GET
Path: /
Response: Welcome message and version
Accessing the root returns meta information about the server. The response is a JSON structure containing information about the server, including a welcome message and the version of the server. The server version describes
the CouchDB version the server is compatible with, whereas the cloudant_build is the build number of Cloudants
CouchDb implementation.
{
"couchdb": "Welcome",
"version": "1.0.2",
"cloudant_build":"1138"
}

2.7.2 Retrieving a list of active tasks


Method: GET
Path: /_actice_tasks
Response: List of running tasks, including the task type, name, status and process ID
Roles permitted: _admin
You can obtain a list of active tasks by using the /_active_tasks URL. The result is a JSON array of the
currently running tasks, with each task being described with a single object. For example:
[
{

"user": null,
"updated_on": 1363274088,
"type": "replication",
"target": "https://repl:*****@tsm.cloudant.com/user-3dglstqg8aq0uunzimv4uiimy/",
"docs_read": 0,
"doc_write_failures": 0,
"doc_id": "tsm-admin__to__user-3dglstqg8aq0uunzimv4uiimy",
"continuous": true,
"checkpointed_source_seq": "403-g1AAAADfeJzLYWBgYMlgTmGQS0lKzi9KdUhJMjTRyyrNSS3QS87JL01JzCvRy0
"changes_pending": 134,
"pid": "<0.1781.4101>",
"node": "dbcore@db11.julep.cloudant.net",
"docs_written": 0,
"missing_revisions_found": 0,
"replication_id": "d0cdbfee50a80fd43e83a9f62ea650ad+continuous",
"revisions_checked": 0,
"source": "https://repl:*****@tsm.cloudant.com/tsm-admin/",
"source_seq": "537-g1AAAADfeJzLYWBgYMlgTmGQS0lKzi9KdUhJMjTUyyrNSS3QS87JL01JzCvRy0styQGqY0pkSLL
"started_on": 1363274083
},

2.7. Miscellaneous

97

Documentation, Release 1.0.2

{
"user": "acceptly",
"updated_on": 1363273779,
"type": "indexer",
"node": "dbcore@db11.julep.cloudant.net",
"pid": "<0.20723.4070>",
"changes_done": 189,
"database": "shards/00000000-3fffffff/acceptly/acceptly_my_chances_logs_live.1321035717",
"design_document": "_design/MyChancesLogCohortReport",
"started_on": 1363273094,
"total_changes": 26389
},
{
"user": "username",
"updated_on": 1371118433,
"type": "search_indexer",
"total_changes": 5466,
"node": "dbcore@db7.meritage.cloudant.net",
"pid": "<0.29569.7037>",
"changes_done": 4611,
"database": "shards/40000000-7fffffff/username/database_name",
"design_document": "_design/lucene",
"index": "search1",
"started_on": 1371118426
},
{
"view": 1,
"user": "acceptly",
"updated_on": 1363273504,
"type": "view_compaction",
"total_changes": 26095,
"node": "dbcore@db11.julep.cloudant.net",
"pid": "<0.21218.4070>",
"changes_done": 20000,
"database": "shards/80000000-bfffffff/acceptly/acceptly_my_chances_logs_live.1321035717",
"design_document": "_design/MyChancesLogCohortReport",
"phase": "view",
"started_on": 1363273094
},
{
"updated_on": 1363274040,
"node": "dbcore@db11.julep.cloudant.net",
"pid": "<0.29256.4053>",
"changes_done": 272195,
"database": "shards/00000000-3fffffff/heroku/app3245179/id_f21a08b7005e_logs.1346083461",
"started_on": 1363272496,
"total_changes": 272195,
"type": "database_compaction"
}
]

The returned structure includes the following fields for each task:
pid: Erlang Process ID
type: Operation Type
updated_on: Time when the last update was made to this task record. Updates are made by the job as
progress occurs. The value is in Unix time UTC.
started_on: Time when the task was started. The value is in Unix time UTC.
total_changes: Total number of documents to be processed by the task. The exact meaning depends on the
type of the task.

98

Chapter 2. API Reference

Documentation, Release 1.0.2

database: The database (and shard) on which the operation occurs


For the tasks type, valid values include:
database_compaction
replication
view_compaction
indexer
search_indexer
You can find an example for each one above.
The meaning of the remaining fields depends on the type of the task.
Replication tasks
replication_id: Unique identifier of the replication that can be used to cancel the task
user: User who started the replication
changes_pending: Number of documents needing to be changed in the target database.
revisions_checked: Number of document revisions for which it was checked whether they are already in
the target database.
continuous: Whether the replication is continuous
docs_read: Documents read from the source database
Indexing tasks
design_document: The design document containing the view or index function(s)
total_changes: Total number of unindexed changes from when the MVCC snapshot is opened
changes_done: Number of document revisions processed by this task. A document can have 1 or more
revisions
Compaction tasks
total_changes: Number of documents in the database
changes_done: Number of documents compacted
phase: First the compactions happen for documents (phase ids), then views are compacted (phase views).

2.7.3 Obtaining a list of nodes in a cluster


Method: GET
Path: /_membership
Response: JSON document listing cluster nodes and all nodes
Roles permitted: _admin
Response structure
cluster_nodes: Array of node names (strings) of the active nodes in the cluster
all_nodes: Array of nodes names (strings) of all nodes in the cluster
2.7. Miscellaneous

99

Documentation, Release 1.0.2

Example request and response


GET /_membership HTTP/1.1
Accept: application/json
{

"all_nodes": ["dbcore@db1.testy004.cloudant.net", "dbcore@db2.testy004.cloudant.net", "dbcore@db


"cluster_nodes": ["dbcore@db1.testy004.cloudant.net", "dbcore@db2.testy004.cloudant.net", "dbcor
}

2.7.4 Replicating a database


Method: POST
Path: /_replicate
Request: Replication specification
Response: TBD
Roles permitted: _admin
Return Codes
Code
200
202
404
500

Description
Replication request successfully completed
Continuous replication request has been accepted
Either the source or target DB is not found
JSON specification was invalid

Request, configure, or stop, a replication operation.


The specification of the replication request is controlled through the JSON content of the request. The JSON
should be an object with the fields defining the source, target and other options. The fields of the JSON request
are shown in the table below:
cancel (optional): Cancels the replication
continuous (optional): Configure the replication to be continuous
create_target (optional): Creates the target database
doc_ids (optional): Array of document IDs to be synchronized
proxy (optional): Address of a proxy server through which replication should occur
source: Source database URL, including user name and password
target: Target database URL, including user name and password
Replication Operation
The aim of the replication is that at the end of the process, all active documents on the source database are also
in the destination database and all documents that were deleted in the source databases are also deleted (if they
existed) on the destination database.
Replication can be described as either push or pull replication:
Pull replication is where the source is the remote database instance, and the destination is the local
database.
Pull replication is the most useful solution to use if your source database has a permanent IP address, and
your destination (local) database may have a dynamically assigned IP address (for example, through DHCP).
This is particularly important if you are replicating to a mobile or other device from a central server.
100

Chapter 2. API Reference

Documentation, Release 1.0.2

Push replication is where the source is a local database, and destination is a remote database.
For example, to request replication between a database on the server example.com, and a database on Cloudant
you might use the following request:
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}

In all cases, the requested databases in the source and target specification must exist. If they do not, an error
will be returned within the JSON object:
{
"error" : "db_not_found"
"reason" : "could not open http://username.cloudant.com/ol1ka/",
}

You can create the target database (providing your user credentials allow it) by adding the create_target
field to the request object:
POST http://username.cloudant.com/_replicate
Content-Type: application/json
Accept: application/json
{
"create_target" : true
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}

The create_target field is not destructive. If the database already exists, the replication proceeds as normal.
Single Replication
You can request replication of a database so that the two databases can be synchronized. By default, the replication
process occurs one time and synchronizes the two databases together. For example, you can request a single
synchronization between two databases by supplying the source and target fields within the request JSON
content.
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"source" : "http://user:pass@user.cloudant.com/recipes",
"target" : "http://user:pass@user.cloudant.com/recipes2",
}

In the above example, the databases recipes and recipes2 will be synchronized. The response will be a
JSON structure containing the success (or failure) of the synchronization process, and statistics about the process:
{
"ok" : true,
"history" : [
{
"docs_read" : 1000,
"session_id" : "52c2370f5027043d286daca4de247db0",
"recorded_seq" : 1000,

2.7. Miscellaneous

101

Documentation, Release 1.0.2

"end_last_seq" : 1000,
"doc_write_failures" : 0,
"start_time" : "Thu, 28 Oct 2010 10:24:13 GMT",
"start_last_seq" : 0,
"end_time" : "Thu, 28 Oct 2010 10:24:14 GMT",
"missing_checked" : 0,
"docs_written" : 1000,
"missing_found" : 1000
}
],
"session_id" : "52c2370f5027043d286daca4de247db0",
"source_last_seq" : 1000
}

The structure defines the replication status, as described in the table below:
history [array]: Replication History
doc_write_failures: Number of document write failures
docs_read: Number of documents read
docs_written: Number of documents written to target
end_last_seq: Last sequence number in changes stream
end_time: Date/Time replication operation completed
missing_checked: Number of missing documents checked
missing_found: Number of missing documents found
recorded_seq: Last recorded sequence number
session_id: Session ID for this replication operation
start_last_seq: First sequence number in changes stream
start_time: Date/Time replication operation started
ok: Replication status
session_id: Unique session ID
source_last_seq: Last sequence number read from source database
Continuous Replication
Synchronization of a database with the previously noted methods happens only once, at the time the replicate request is made. To have the target database permanently replicated from the source, you must set the continuous
field of the JSON object within the request to true.
With continuous replication changes in the source database are replicated to the target database in perpetuity until
you specifically request that replication ceases.
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"continuous" : true
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}

Changes will be replicated between the two databases as long as a network connection is available between the
two instances.

102

Chapter 2. API Reference

Documentation, Release 1.0.2

Note: To keep two databases synchronized with each other, you need to set replication in both directions; that is,
you must replicate from databasea to databaseb, and separately from databaseb to databasea.

Canceling Continuous Replication


You can cancel continuous replication by adding the cancel field to the JSON request object and setting the
value to true. Note that the structure of the request must be identical to the original for the cancellation request to
be honoured. For example, if you requested continuous replication, the cancellation request must also contain the
continuous field.
For example, the replication request:
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
"create_target" : true,
"continuous" : true
}

Must be canceled using the request:


POST /_replicate
Content-Type: application/json
Accept: application/json
{
"cancel" : true,
"continuous" : true
"create_target" : true,
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}

Requesting cancellation of a replication that does not exist results in a 404 error.

2.7.5 Retrieving UUIDs


Method: GET
Path: /_uuids
Response: JSON document containing a list of UUIDs
Query Arguments
Argument
count

Description
Number of UUIDs to return

Optional
yes

Type
numeric

Requests one or more Universally Unique Identifiers (UUIDs). The response is a JSON object providing a list of
UUIDs. For example:
{
"uuids" : [
"7e4b5a14b22ec1cf8e58b9cdd0000da3"
]
}

2.7. Miscellaneous

103

Documentation, Release 1.0.2

You can use the count argument to specify the number of UUIDs to be returned. For example:
GET /_uuids?count=5

Returns:
{
"uuids" : [
"c9df0cdf4442f993fc5570225b405a80",
"c9df0cdf4442f993fc5570225b405bd2",
"c9df0cdf4442f993fc5570225b405e42",
"c9df0cdf4442f993fc5570225b4061a0",
"c9df0cdf4442f993fc5570225b406a20"
]
}

2.8 Local (non-replicating) Documents


The Local (non-replicating) document interface allows you to create local documents that are not replicated to
other databases. These documents can be used to hold configuration or other information that is required specifically on the local server instance.
Local documents have the following limitations:
Local documents are not replicated to other databases.
The ID of the local document must be known for the document to accessed. You cannot obtain a list of local
documents from the database.
Local documents are not output by views, or the _all_docs view.
Local documents can be used when you want to store configuration or other information for the current (local)
instance of a given database.
A list of the available methods and URL paths are provided below:
Method
GET
PUT
DELETE
COPY

Path
/db/_local/local-doc
/db/_local/local-doc
/db/_local/local-doc
/db/_local/local-doc

Description
Returns the latest revision of the non-replicated document
Inserts a new version of the non-replicated document
Deletes the non-replicated document
Copies the non-replicated document

2.8.1 Retrieving a local document


Method: GET /db/_local/local-doc
Request: None
Response: JSON of the returned document
Roles permitted: _reader
Query Arguments:
Argument: rev
* Description: Specify the revision to return
* Optional: yes
* Type: string
* Supported Values:
true: Includes the revisions

104

Chapter 2. API Reference

Documentation, Release 1.0.2

Argument: revs
* Description: Return a list of the revisions for the document
* Optional: yes
* Type: boolean
Argument: revs_info
* Description: Return a list of detailed revision information for the document
* Optional: yes
* Type: boolean
* Supported Values
true: Includes the revisions
Return Codes:
400: The format of the request or revision was invalid
404: The specified document or revision cannot be found, or has been deleted
Gets the specified local document. The semantics are identical to accessing a standard document in the specified
database, except that the document is not replicated. See Retrieving a document.

2.8.2 Creating or updating a local document


Method: PUT /db/_local/local-doc
Request: JSON of the document
Response: JSON with the committed document information
Roles permitted: _writer
Return Codes:
201: Document has been created successfully
Stores the specified local document. The semantics are identical to storing a standard document in the specified
database, except that the document is not replicated. See Creating or updating a document.

2.8.3 Deleting a local document


Method: DELETE /db/_local/local-doc
Request: None
Response: JSON with the deleted document information
Roles permitted: _writer
Query Arguments:
Argument: rev
* Description: Current revision of the document for validation
* Optional: yes
* Type: string
HTTP Headers
Header: If-Match
* Description: Current revision of the document for validation

2.8. Local (non-replicating) Documents

105

Documentation, Release 1.0.2

* Optional: yes
Return Codes:
409: Supplied revision is incorrect or missing
Deletes the specified local document. The semantics are identical to deleting a standard document in the specified
database, except that the document is not replicated. See Deleting a document.

2.8.4 Copying a local document


Method: COPY /db/_local/local-doc
Request: None
Response: JSON of the copied document
Roles permitted: _writer
Query Arguments:
Argument: rev
* Description: Revision to copy from
* Optional: yes
* Type: string
HTTP Headers
Header: Destination
* Description: Destination document (and optional revision)
* Optional: no
Copies the specified local document. The semantics are identical to copying a standard document in the specified
database, except that the document is not replicated. See Copying a document.

106

Chapter 2. API Reference

CHAPTER

THREE

USING CLOUDANT WITH...

Although you can access Cloudant directly over HTTP, plenty of language-specific tools and frameworks exist to
make it even easier.

3.1 Python
Pythons readability and standardized coding practices make it effortless to read and write, and as a result its great
for teaching. Academicians and scientists make heavy use of it for all manner of research and analysis, notably
the Natural Language ToolKit, SciPy, NumPy, Matplotlib, Numba, and PyTables.
Python is also superb for working with Cloudant. If you dont already have Python, oh golly, go get it.

3.1.1 Packages: pip and virtualenv


The Python community uses a package manager called pip to install, uninstall, and upload packages to an online
repository called PyPI. If you dont have pip, get pip
Then, you can install packages as easy as this:
pip install [name]

Thatll install to someplace on your PYTHONPATH, so your projects have access to the downloaded package. If
you want to isolate dependencies between projects, use virtualenv, which you can install like this:
sudo pip install virtualenv

Then, to create an isolated Python environment for your project, just do this:
# create a venv folder
virtualenv venv
# use its bin, etc., folders for Python business
source venv/bin/activate

You should see (venv) prepended to your terminals command line. Now, any pip packages you install will go into
that venv folder.
For even more convenience creating virtual environments for your Python projects, check out virtualenvwrapper,
which extends virtualenv for more ease-of-use.

3.1.2 Libraries: CouchDB-Python, Couchdbkit, and Requests


Because Cloudants API so resembles that of CouchDBs, you can use CouchDB client libraries like CouchDBPython and Couchdbkit to work with Cloudant. Or, you can consume Cloudants HTTP API directly using an
HTTP library like Requests.

107

Documentation, Release 1.0.2

CouchDB-Python
Cloudants API resembles the way Python dict objects work, which makes CouchDB-Python wickedly intuitive
to use. To get the library, just pip it!
pip install couchdb

Using CouchDB-Python is wicked easy. Check it out:


# connecting to your Cloudant instance
import couchdb
couch = couchdb.Server("https://%s.cloudant.com" % USERNAME)
couch.resource.credentials = (USERNAME, PASSWORD)
# accessing a database
db = couch[test]
# or, creating one
db = couch.create(test)
# accessing a document
doc = db[DOCUMENT_ID]
# or, creating one
doc_id, doc_rev = db.save({
name: Mike Broberg,
title: Fun Captain,
superpower: More fun than a hallucinogenic trampoline
})
doc = db[doc_id]
Query results are treated as iterators, like this:
# print all docs in the database
for doc in db:
print doc
# or, do the same on a pre-defined index
# where DDOC/INDEX maps to _design/DDOC/_view/INDEX
# in the HTTP API
for doc in db.view(DDOC/INDEX):
print doc

Check out the documentation for even more goodies :D


Couchdbkit
Just like CouchDB-Python, Couchdbkit uses native Python objects to make interacting with Cloudant effortless,
but it provides a fuller API and more convenience methods. To install it, just pip it!
pip install couchdbkit

Its interface is very similar, too, which you can see in more detail here:
import couchdbkit
# connect to cloudant
server = couchdbkit.Server(https://USERNAME:PASSWORD@USERNAME.cloudant.com)
db = server.get_or_create_db(posts)

# save a document
db.save_doc({
author: Mike Broberg,
content: "In my younger and more vulnerable years, my father told me, Son, turn that racket
})

108

Chapter 3. Using Cloudant With...

Documentation, Release 1.0.2

# save lots of docs :D


docs = [{...}, {...}, {...}]
db.bulk_save(docs)

Couchdbkit exposes a document class for adding simple schemas to your application, like so:
class Post(couchdbkit.Document):
author = couchdbkit.StringProperty()
content = couchdbkit.StringProperty()
# associate posts with a given database object
Post.set_db(db)
new_post = Post(
author="Mike Broberg",
content="In his spare time, Mike Broberg enjoys barbecuing meats of various origins."
)
# save the post to its associated database
new_post.save()

Performing queries returns iterators:


# print all docs in the database
for doc in db.view(_all_docs):
print doc
# or, do the same on a pre-defined index
# where DDOC/INDEX maps to _design/DDOC/_view/INDEX
# in the HTTP API
for doc in db.view(DDOC/INDEX):
print doc

For more detail than you could ever want, check out the API Documentation.
Requests
Because Cloudants API is just an HTTP interface, you can interact with it from any HTTP library. Python has a
beautiful HTTP library called Requests, so if you dont want to deal with all the abstraction of a client library, use
that. Heres how:
First, pip!
pip install requests

Then, lets talk to our database:


import requests
account_url = https://USERNAME:PASSWORD@USERNAME.cloudant.com
response = requests.get(account_url)
print response.status_code
# 200
print response.json()
# {
#
"couchdb": "Welcome",
#
"version": "1.0.2",
#
"cloudant_build": "1678"
# }

Hey, why not write something?

3.1. Python

109

Documentation, Release 1.0.2

import json

db_url = account_url + /posts


doc = dict(
author=Mike Broberg,
content="I once did a triple backflip over seventeen buses using only a red Solo cup and a cop
)
headers = {Content-type: application/json}
response = requests.post(db_url, data=json.dumps(doc), headers=headers)
print response.status_code
# 201

Dandy, eh? Just hit Cloudants API by URL directly. Dont forget to set headers and request bodies appropriately!
See Requests documentation for more info.

3.1.3 Frameworks: Flask and Django


Python is great for web development. Two frameworks stand out in the community for building robust web
applications, Flask and Django:
Flask
Flask is a micro-framework that makes it effortless to rapidly develop web apps. To get it, run pip install flask.
Then, check out the example code:
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
return "Hello World!"
if __name__ == "__main__":
app.run()

Run it with python [file] and boom, youre live. Thats all it takes to build a website with Flask.
A rich community of addons, plugins, and integrations mean you can add features and functionality quickly
without relying on monolithic software choices. Just google flask [thing] and youll more than probably find
integrations that suit your needs.
Django
Django is to Python as Ruby on Rails is to Ruby: an enormous, web-focused project that attempts to make
intelligent design decisions about app structure, architecture, layout, and tooling so that you can focus on building
your app. Some of what it provides out of the box:
A SQL model layer
HTML templating
Automatic administrative interface
Elegant URL routing
Internationalization support
Authentication and authorization systems
Django has an enormous community of addons, plugins, integrations, and the like, so if you need more, just google
django [thing] and youll likely find what you need. To get started with Django, check out their documentation.

110

Chapter 3. Using Cloudant With...

Documentation, Release 1.0.2

3.1.4 CouchApps
The original CouchApp utility, couchapp, is written in Python. Though the original author has chosen to focus on
Erica instead, the python utility is still more feature-complete and works just fine. To install the utility, just pip it:
sudo pip install couchapp

Then, you can scaffold, push, and even pull CouchApps right from the command line:

# scaffold a couchapp
couchapp generate cloudant
# push it live
couchapp push cloudant https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE
# clone it!
couchapp clone https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE/_design/cloudant cloudant2
# oh look, there it is!
ls cloudant2

The python utility also supports the _docs folder, which couchapp will upload to your target database as JSON
documents rather than attachments. Very nifty for scaffolding data, or syncing projects.

As always, if you have any trouble, post your question to StackOverflow, ping us on IRC, or if youd like to
discuss the matter in private, email us at support@cloudant.com.
Happy coding!

3.2 Node.js
Node.js is an open-source platform for writing JavaScript on the server. Its fast, asynchronous, and has a tremendous community that is only getting bigger. Because Cloudant indexes are in JavaScript, along with all client-side
code, writing JavaScript on the server means your head never needs to switch gears. But perhaps most importantly,
there are a ton of tools that make developing on Cloudant with Node.js effortless.
To get started with Node.js, download the binary for your operating system here.

3.2.1 Package Management


npm, or Node Package Manager, is a CouchApp that hosts packages for the Node.js community. It comes with
Node.js, and you use it like this:
npm install [package]

Because npmjs.org is a CouchApp, you can replicate it and host your own registry using Cloudant. Specifically,
replicate from https://registry.npmjs.org/ to your database, and bam, you have your own private
registry. Then, you can push custom or private libraries to your registry and download them just like you would
from npm.

3.2.2 App Frameworks


Express is a web framework around Node.js that simplifies things like middleware and URL routing. It even comes
with a starting template! To install, run npm install -g express; to scaffold a project, run express
wherever you want to get started. The getting started guide serves as a fantastic introduction to Node.js itself, too.
Express-Cloudant is an extended Express template for working with Cloudant. Like Express, it stays out of your
way, but comes with features for making life easy:
Built-in reverse proxy lets you query your Cloudant data from the client.
Custom API in routes/api.js using PouchDB, exposing your database in a more controlled fashion.
3.2. Node.js

111

Documentation, Release 1.0.2

Uses Grunt to manage static assets and design documents.


Manages design documents in the ddocs folder as JavaScript rather than raw JSON.
The project NoSQL-Listener is built using Express-Cloudant as a base.

3.2.3 Libraries
Several JavaScript libraries make it effortless to work with Cloudant:
PouchDB
PouchDB is a JavaScript package that runs either in the browser or in a Node.js environment, and acts like its own
little Cloudant instance, so you can write data to it even if youre not online, and sync data between it and remote
Cloudant instances.
Creating a PouchDB instance that syncs with a Cloudant database:
var db = new PouchDB(dbname),
remote = https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE,
opts = {
continuous: true
};
db.replicate.to(remote, opts);
db.replicate.from(remote, opts);

Writing, reading, etc.:


// create a document; log the response
db.post({
name: Mike Broberg
}, function (err, response) {
console.log(err || response);
});
// read a document by ID; log the response
db.get(DOCUMENT_ID, function (err, response) {
console.log(err || response);
});
// update a document; log the response
db.put({
_id: DOCUMENT_ID,
_rev: DOCUMENT_REV,
name: Mike Broberg,
title: Baseballer of the Ninth Circle
}, function (err, response) {
console.log(err || response);
});
// delete a document; log the response
db.remote({
_id: DOCUMENT_ID,
_rev: DOCUMENT_REV
}, function (err, response) {
console.log(err || response);
});

You can even run ad-hoc queries on PouchDBs local dataset:

112

Chapter 3. Using Cloudant With...

Documentation, Release 1.0.2

db.query({
// write your map function in JavaScript
map: function (doc) {
if (doc.title) emit(doc.title, null);
}
}, {
// in this example, we wont use a reduce function
reduce: false
}, function (err, response) {
// log the error, or the response if no error
console.log(err || response);
});

Nano
Nano tries to be as out-of-your-way as possible, so its very lightweight to use:
// require nano, point it at our instances root
var nano = require(nano)(https://garbados.cloudant.com);
// create a database
nano.db.create(example);
// create an alias for working with that database
var example = nano.db.use(example);
// fetch the primary index
example.list(function(err, body){
if (err) {
// something went wrong!
throw new Error(err);
} else {
// print all the documents in our database
console.log(body);
}
});

Nano has begun to support Cloudant-specific features like search, which makes it my library of choice for working
with Cloudant from Node.js.
Cradle
Cradle is a more full-bodied library than Nano, with features like caching, and convenience methods to get and
update documents. This usage example comes from its readme:
var cradle = require(cradle);
var db = new(cradle.Connection)().database(starwars);
db.get(vader, function (err, doc) {
doc.name; // Darth Vader
assert.equal(doc.force, dark);
});
db.save(skywalker, {
force: light,
name: Luke Skywalker
}, function (err, res) {
if (err) {
// Handle error
} else {
// Handle success
}
});

3.2. Node.js

113

Documentation, Release 1.0.2

3.2.4 CouchApps: node.couchapp.js


node.couchapp.js is a utility for writing CouchApps, like the Python CouchApp or Erlang Erica utility. I love it
because you can use it to write design docs as either a series of separate files across multiple folders, or as a single
JavaScript file like this:
var couchapp = require(couchapp)
, path = require(path);
ddoc = {
_id: _design/app
, views: {}
, lists: {}
, shows: {}
};
module.exports = ddoc;
ddoc.views.byType = {
map: function(doc) {
emit(doc.type, null);
},
reduce: _count
};
ddoc.lists.people = function(head, req) {
start({
headers: {"Content-type": "text/html"}
});
send("<ul id=people>\n");
while(row = getRow()) {
send("\t<li class=person name>" + row.key + "</li>\n");
}
send("</ul>\n")
};
ddoc.shows.person = function(doc, req) {
return {
headers: {"Content-type": "text/html"},
body: "<h1 id=person class=name>" + doc.name + "</h1>\n"
};
};
ddoc.validate_doc_update = function (newDoc, oldDoc, userCtx) {
function require(field, message) {
message = message || "Document must have a " + field;
if (!newDoc[field]) throw({forbidden : message});
};
if (newDoc.type == "person") {
require("name");
}
};
couchapp.loadAttachments(ddoc, path.join(__dirname, _attachments));

Nifty, eh?
Check out these CouchApps built with node.couchapp.js as examples:
Egg Chair: like Pinterest and Flickr, but without the terms and conditions.
Chaise Blog: A CouchApp blog, using two databases and filtered replication to share only what you want.

114

Chapter 3. Using Cloudant With...

Documentation, Release 1.0.2

3.2.5 Workflow: Yeoman, Grunt, Bower


Yeoman, Grunt, and Bower are the backbone of a modern JavaScript workflow:
Yeoman scaffolds new projects
Grunt automates their build, test, and deploy processes
Bower retrieves client-side JavaScript libraries
All three of these tools have vibrant communities around them, generating plugins that do an increasing amount
of your work for you. If you are using Make in your JavaScript projects, stop, and learn Grunt. Future-you will be
thankful. Thanks to these tools, my workflow typically looks like this:
mkdir [app] && cd $_
yo [template]
// answer the templates questions
grunt

With only a few commands, my project is built, tested, and deployed. Here are some generators I use frequently
at Cloudant:
generator-reveal: Scaffold reveal.js presentations and upload them to Cloudant by running grunt couch.
generator-couchapp: Scaffolds a blank CouchApp that you can upload to Cloudant by running grunt.
And Grunt plugins:
grunt-couchapp automates pushing CouchApps, using node.couchapp.js, along with creating and deleting
databases, using nano
grunt-couch: like grunt-couchapp, but with an interface more like the classic Python CouchApp utility.

3.2.6 Gotchas: Callbacks


Just like the JavaScript executing in your browser, Node.js is single-threaded and event-driven, which makes it
wicked fast, but can feel strange coming from synchronous languages.
For example, if you make an HTTP request, Node.js doesnt wait for it to finish. It creates an event to fire once
you get a response, then continues executing your program while your request adventures about the interwebs. So,
rather than getting a return value with the HTTP response, you give your request a function, called a callback,
with instructions on how to handle the eventual response.
In Nano, for example, this code...
// get all our docs
db.list(function(err, body){
if (err) {
// something went wrong!
throw new Error(err);
} else {
// print all the documents in our database
console.log(body);
}
});
// print hello world :D
console.log(hello, world!);

... will yield this:


> hello, world!
> {"total_rows": 1, "offset": 0, "rows": [...]}

For more on asynchronous programming in Node.js, check out Control Flow in Node.

3.2. Node.js

115

Documentation, Release 1.0.2

3.3 .NET / Mono


The Microsoft .NET framework is the primary development platform for Windows, Windows Phone, Windows
Azure. The platform supports multiple languages, principally C#, VB.NET and F#. Additionally, there is an
open-source, cross-platform implementation in the form of Mono, sponsored by Xamarin, enabling iPhone and
Android application development using C#.
You can easily connect to Cloudant directly from your .NET website or application. Its also available as a plugin
on Windows Azure, AppHarbour and Heroku.

3.3.1 Libraries
MyCouch
MyCouch is a modern, asynchronous CouchDb / Cloudant client for .NET. It provides an extensible, thin wrapper
around the CouchDb HTTP API, allowing you to plug in your own serialisation layer. MyCouch is also the only
.NET client to currently have first-class support for Cloudant.
An example:
// connect to Cloudant
using (var client ~ new Client("https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE"))
{
// get document by ID
await client.Documents.Get("12345");
// get document by ID (strongly typed POCO version)
MyObject myObj ~ await client.Documents.Get<MyObject>("12345");
}

Installation: NuGet
Compatibility: .NET 4 and above, Windows Store
LoveSeat
LoveSeat is a popular and well established CouchDB / Cloudant C# client, architected with the intent to abstract
away just enough so that its easy to use, but not enough so that you dont know whats going on. The API is
synchronous and doesnt yet support Cloudant-specific features such as Search or API key management.
An example:

// connect to Cloudant
var client ~ new CouchClient("username.cloudant.com", 443, username, password, true, Authentic
var db~ client.GetDatabase("Northwind");
// get document by ID
Document myDoc ~ db.GetDocument("12345");
// get document by ID (strongly typed POCO version)
MyObject myObj ~ db.GetDocument<MyObject>("12345");

Installation: NuGet
Compatibility: .NET 3.5 and above, Mono 2.9

3.3.2 Tutorials
MyCouch says hello to Cloudant
In this tutorial, MyCouch developer, Daniel Wertheim, walks us through setting up MyCouch to talk to Cloudant
and perform basic CRUD operations.

116

Chapter 3. Using Cloudant With...

Documentation, Release 1.0.2

Cloudant case study: Release Mobile


Learn from this case study how and why Release Mobile use Cloudant and Windows Azure to build data sharing
applications.
SQL - NoSQL webinar: Top 6 questions to ask when migrating your app
In this webinar, Max Thayer, Cloudant developer advocate and Tom Fennelly, Integration Technologies Lead
at Cloudbees, discuss what you need to consider when moving from world of relational databases to a NoSQL
document store. Hear about the key differences between relational databases and NoSQL document stores as well
as how to dodge the pitfalls of migrating from a relational database to NoSQL.

3.3.3 Resources
Windows Azure
Windows Azure is an open and flexible cloud platform that enables you to quickly build, deploy and manage
applications across a global network of Microsoft-managed datacenters. Cloudant have partnered with Microsoft
to provide a multi-tenant cluster on Azure (Lagoon 2 - currently in beta).
AppHarbour
AppHarbour is a fully-hosted .NET platform as a service. Cloudant is available as an add-on for the service or
you can choose to sign up to Cloudant directly.
FoxWeave
FoxWeave is a service that allows you to build a data import workflow, processing data as it streams from one data
store to another. For example taking data from an existing SQL database and shipping it to Cloudant.

3.3. .NET / Mono

117

Documentation, Release 1.0.2

118

Chapter 3. Using Cloudant With...

CHAPTER

FOUR

GUIDES

4.1 The CAP Theorem


The CAP (Consistency, Availability, and Partition tolerance) theorem states that a distributed computing system
can only exhibit two of the three following characteristics:
Consistency: all nodes see the same data at the same time.
Availability: every request receives a response indicating success or failure.
Partition Tolerance: the system continues to operate despite arbitrary message loss or failure of part of the
system.
For a good time, check out the formal proof <http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf>__
of the CAP theorem.
A database can only exhibit two of these three for both theoretical and practical reasons. A database prioritizing
consistency and availability is simple: a single node storing a single copy of your data. But this is difficult to scale
as you must upgrade the node to get more performance, rather than leverage additional nodes. And, even a minor
system failure can shut down a single-node system, while any message loss will mean significant data loss. To
endure, the system must become more sophisticated.

4.1.1 Tradeoffs in Partition Tolerance


A database which prioritizes consistency and partition tolerance will commonly employ a master-slave setup,
where some node of the many in the system is elected leader. Only the leader can approve data writes, while
all secondary nodes replicate data from the leader in order to handle reads. If the leader loses connection to the
network, or cant communicate with a majority of the systems nodes, the majority elects a new leader. This
election process will differ between systems, and can be a source of significant problems.
Cloudant prioritizes availability and partition tolerance by employing a master-master setup, such that every node
can accept both writes and reads to its portion of your data. Multiple nodes contain copies of each portion of your
data, with each copying data between them, so that if a node becomes inaccessible, others can serve in its place
while the network heals. This way, the system will return your data in a timely manner despite arbitrary node
failure, while maintaining eventual consistency. The tradeoff in deprioritizing absolute consistency is that it will
take a moment for all nodes to see the same data, such that responses may contain old data while the new data
propagates through the system.

4.1.2 Changing our thinking


Maintaining one consistent view of our data is logical and easy to understand because a relational database does
this work for you. We expect Web services interacting with database systems to behave this way, but that doesnt
mean they should. Consistency isnt a given, and it takes a little work to change our approach.
In fact, consistency isnt necessarily essential for many enterprise cloud services. Large, heavily used systems
bring with them a high probability that a portion of the system may fail. A database engineered around this
assumption that prioritizes availability and eventual consistency is better suited to keeping your application online.

119

Documentation, Release 1.0.2

The consistency of application data can be addressed after the fact. As Seth Gilbert and Nancy Lynch of MIT
conclude in their proof of CAP theorem, most real-world systems today are forced to settle with returning most
of the data, most of the time.

4.1.3 Application availability vs. consistency in the enterprise


A look at popular Web services shows that people already expect high availability, and happily trade this for
eventually consistent data, often without realizing they are doing so.
Applications have been lying to users for years for the sake of availability. Consider ATMs: inconsistent banking
data is why its still possible to overdraft money without realizing it. It is unrealistic to present a consistent view of
your account balance throughout the entire banking system if every node in the network needs to halt and record
this figure before continuing operations. Its better to make the system highly available.
The banking industry figured it out back in the 1980s, but many IT organizations are still worried about sacrificing
consistency for the sake of availability. Think about the number of support calls placed when your sales team cant
access their CRM app. Now consider if they would even notice when it takes a few seconds for a database update
to propagate throughout the application.
Availability trumps consistency more than you might expect. Online shopping cart systems, HTTP caches, and
DNS are a few more examples. Organizations must consider the cost of downtime: user frustration, productivity
loss, missed opportunities, etc.

4.1.4 From theory to implementation


Addressing high availability is vital for cloud applications. Otherwise, global database consistency will always be
a major bottleneck as you scale. Highly available applications need to maintain constant contact with their data,
even if that data isnt the most up-to-date. Thats the concept of eventual consistency, and its nothing to be scared
of. At large scale, sometimes its better to serve answers that are not perfectly correct than to not serve them at all.
Database systems hide the complexities of availability vs. consistency in different ways, but they are always there.
The view that we take with Cloudants database-as-a-service, along with CouchDB and other NoSQL databases,
is that its better to expose developers to these complexities early in the design process. By doing the hard work
up front, there are no surprises because applications are ready to scale from day one.

4.2 MapReduce
MapReduce is an algorithm for slicing and dicing large datasets across distributed computing systems, such as
Cloudant clusters.
A MapReduce program has two parts:
a map function, which processes documents from your dataset into key-value pairs.
a reduce function, which combines the set returned by map or the results of previous reduce functions
into a single value per key.
Keep reading to see how it works at Cloudant, or head to our blog to read about the math and science behind
MapReduce with some foundational MapReduce literature.

4.2.1 Batch vs Incremental


There are numerous implementations of MapReduce, each with their own strengths and weaknesses. Systems like
Hadoop, unless augmented, optimize to perform MapReduce in batches, where a MapReduce program is executed
on the dataset as it stood at the time. If the dataset changes, then to get an updated value for the MapReduce
program, youll have to run it all over again. This is a showstopper for queries you want realtime updates for, or
for datasets that change frequently.

120

Chapter 4. Guides

Documentation, Release 1.0.2

For secondary indexes, Cloudant uses an implementation of MapReduce which works incrementally. When you
insert or update a document, rather than rerun the program on the entire dataset, we compute only for the documents that changed, and the reduce results those documents impact, so you can access MapReduce results in
only the time it takes to read them from disk, rather than the time it takes to compute them anew.

4.2.2 Secondary Indexes


In Cloudant, secondary indexes sometimes called views are MapReduce programs written in JavaScript.
Below, well discuss their component parts: map and reduce functions. For information on writing the design
documents that hold secondary indexes, see Creating or updating a design document.
Map
map functions emit a key and value. The key is used for sorting and grouping, while the value is consumed by the
reduce function to reduce the dataset to a single value. Lets take a look at a map function:
function(doc){
// if the doc is an event
if(doc.type === "event"){
// emit the events location as a key
// and emit its number of attendees as the value
emit(doc.location, doc.attendees);
}
}

This will let us sort and group events by location, and use a reduce value to, for example, sum the number of
attendees. The result of this map over a dataset might look something like this:
{
"total_rows": 3,
"offset": 0,
"rows": [
{
// the documents ID
"id": "eac6f1faf2cc8dd6fbbbb5205c001763",
// the key we emitted
"key": ["France", "Paris"],
// the value we emitted
"value": 67
},
{
"id": "eac6f1faf2cc8dd6fbbbb5205c0021ce",
"key": ["UK", "Bristol"],
"value": 32
},
{
"id": "986d02a1d491fe906856609e9935fa47",
"key": ["USA", "Boston"]
"value": 194
},
{
"id": "ecfaf6648cec1f8f1f7c6b365c1115f4",
"key": ["UK", "Bristol"],
"value": 45
}
]
}

Both keys and values can be any valid JSON data structure: strings, numbers, arrays, or objects.
Check out query options for all the options you can use to modify map results.

4.2. MapReduce

121

Documentation, Release 1.0.2

Reduce
If at all possible, dont use custom reduce functions! Use this section to learn about how reduces work, but prefer
the built-in functions outlined in the next section. They are simpler, faster, and will save you time.
Lets say we wanted to sum up all the values a map function emitted. That operation would be done in the reduce
function.
Reduces are called with three parameters: key, values and rereduce.
keys will be a list of keys as emitted by the map or, if rereduce is true, null.
values will be a list of values for each element in keys, or if rereduce is true, a list of results from previous
reduce functions.
rereduce will be true or false.
Heres an example that finds the largest value within the dataset:
function (key, values, rereduce) {
// Return the maximum numeric value.
var max = -Infinity
for(var i = 0; i < values.length; i++)
if(typeof values[i] == number)
max = Math.max(values[i], max)
return max
}

ReReduce

Reduce functions can be given either the results of map functions, or the results of reduce functions that already
ran. In that latter case, rereduce is true, because the reduce function is re-reducing the data. (Get it?)
This way, nodes reduce datasets more quickly by handling both map results and, once thats all been processed,
newly computed reduce values.
Heres a simple reduce function that counts values, and handles for rereduce:

function(keys, values, rereduce){


if(rereduce){
// values = [4, 5, 6], indicating previous jobs that counted 4, 5, and 6 documents respectivel
return sum(values);
}else{
// values = [{...}, {...}], indicating processed map results.
return values.length;
}
}

For the mathematically inclined: operations which are both commutative and associative need not worry about
rereduce.
Built-in Reduces
Cloudant exposes several built-in reduce functions which, because theyre written in Cloudants native Erlang
rather than JavaScript, run much faster than custom functions.
_sum

Given an array of numeric values, _sum just, well, sums them up. Our Chained MapReduce example uses _sum
to report the best sales months and top sales reps. Heres an example view:

122

Chapter 4. Guides

Documentation, Release 1.0.2

"map": "function(doc){
if (doc.rep){
emit({"rep": doc.rep}, doc.amount);
}
}",
"reduce": "_sum"

This yields sales by rep. Queried without options, the view will report the total sales for all reps. But, if you group
the results using group=true, youll get sales by rep.
_sum works for documents containing objects and arrays with numeric values inside of them, as long as the
structure of those documents is consistent. So, two documents like...
[
{
"x":
"y":
"z":
},
{
"x":
"y":
"z":
}

1,
2,
3

4,
5,
6

... sum to {x:5, y:7, z:9}.


_count

_count reports the number of docs emitted by the map function, regardless of the emitted values types. Consider
this example:
map: function(doc){ if(doc.type === event){ emit(doc.location, null); } }, reduce: _count
If we grouped by key, this would tell us how many events happened at each location.
_stats

Like _sum on steroids, _stats produces a JSON structure containing the sum, count, min, max and sum squared.
Also like _sum, _stats only deals with numeric values and arrays of numbers; itll get mighty angry if you
start passing it strings or objects. Consider how you might use _stats to get statistics about shopping cart
interactions:
"map": "function(doc){
if(doc.type === "stock"){
emit([doc.stock_symbol, doc.created_at.hour], doc.value);
}
}",
"reduce": "_stats"

With group=true&group_level=1, which groups results on the first key, youll get stats per symbol across
all time. With group=true&group_level=2, youll get stats for trades by stock symbol by hour. Nifty, eh?

4.2.3 Chained Indexes


For particularly complex queries, you may need to run a dataset through multiple transformations to get the
information you need. For that, Cloudant allows you to chain secondary indexes together, by inserting the results
of a map function into another database as documents. Lets see how we do that:

4.2. MapReduce

123

Documentation, Release 1.0.2

{
"map": "function(doc){
if(doc.type === event){
emit(doc.location, doc.attendees);
}
}",
"dbcopy": "other_database"
}

This will populate other_database (or whatever database you indicate) with the results of that map function, like
this:
{
"id": "eac6f1faf2cc8dd6fbbbb5205c0021ce",
"key": ["UK", "Bristol"],
"value": 32
}

You can then write secondary indexes for other_database that manipulate the results accordingly, potentially
including secondary indexes that use dbcopy again to emit another transformation to another database.

4.3 Document Versioning and MVCC


Concurrent updates are a tricky subject in any kind of database. Lets have a look at how versioning works in a
Cloudant database and how you can use it to resolve conflicts between concurrent updates to the same document.

4.3.1 Revisions
In a Cloudant database, every document has a revision. The revision is stored in the _rev field of the document. As
a developer, you should treat it as an opaque string used internally by the database and not rely on it as a counter.
When you retrieve a document from the database, you can either retrieve the latest revision or you can ask for
a past revision by specifying the rev query parameter. However, past revisions will only be kept in the database
for a short time or if the revisions are in conflict. Otherwise, old revisions will be deleted regularly by a process
called compaction. Cloudants revisions are thus not a good fit for implementing a version control system. For this
purpose, we recommend creating a new document per revision. When you update a document, you have to specify
the previous revision, and if the update is successful, the _rev field will be updated automatically. However, if
the revision you specified in your update request does not match the latest revision in the database, your request
will fail with HTTP status 409 (conflict). This technique is called multi-version concurrency control (MVCC);
it prevents concurrent updates from accidentally overwriting or reversing each others changes, works well with
disconnected clients and does not require write locks. That said, as any mechanism for dealing with concurrency,
it does have some tricky parts.

4.3.2 Distributed databases and conflicts


Given our story so far, it seems impossible that we could have a conflict, because any update request has to
reference the latest version of the document. So how would we get a conflict? How would a document get two
different updates based on the same previous version? What we havent taken into account is that Cloudant is
not one monolithic database but rather a distributed system of databases that neednt always be in sync with each
other. This is especially true if you are developing mobile or web applications that have to work without a constant
connection to the main database on Cloudant. When a document on such a disconnected database is updated while
the same document on Cloudant is also updated, this will lead to a conflict when the remote database is replicated
to Cloudant. While replication from local, disconnected databases is a common source of conflicts, it is not the
only one. Cloudants own infrastructure is a distributed system and updating your Cloudant database concurrently
(for example from multiple web servers) can - very rarely - also lead to conflicts. In short, no matter what kind of
application you have and how it works, conflicts can always happen.

124

Chapter 4. Guides

Documentation, Release 1.0.2

4.3.3 How to find conflicts


To find out whether a document is in a conflict state, you can add the query parameter conflicts=true when you
retrieve the document. The returned document will then contain a _conflicts array with all conflicting revisions.
To find conflicts for multiple documents in a database, the best approach is to write a view. Here is a map function
that emits all conflicting revisions for every document that has a conflict:
function(doc) {
if (doc._conflicts) {
emit(null, [doc._rev].concat(doc._conflicts));
}
}

You can then regularly query this view and resolve conflicts as needed or query the view after each replication.

4.3.4 How to resolve conflicts


Once youve found a conflict, you can resolve it in 4 steps.
Get the conflicting revisions.
Merge them in your application or ask the user what he wants to do.
Upload the new revision.
Delete old revisions.
Lets look at an example of how this can be done. Suppose we have a database of products for an online shop.
The first version of a document might look like this:
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "1-7438df87b632b312c53a08361a7c3299",
"name": "Samsung Galaxy S4",
"description": "",
"price": 650
}

As the document doesnt have a description yet, someone might add one.
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "2-61ae00e029d4f5edd2981841243ded13",
"name": "Samsung Galaxy S4",
"description": "Latest smartphone from Samsung",
"price": 650
}

At the same time, someone else - working with a replicated database - reduces the price.
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "2-f796915a291b37254f6df8f6f3389121",
"name": "Samsung Galaxy S4",
"description": "",
"price": 600
}

Then the two databases are replicated, leading to a conflict.


1. Getting conflicting revisions
We get the document with conflicts=true like this...

4.3. Document Versioning and MVCC

125

Documentation, Release 1.0.2

http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?conflicts=true
...and get the following response:
{
"_id":"74b2be56045bed0c8c9d24b939000dbe",
"_rev":"2-f796915a291b37254f6df8f6f3389121",
"name":"Samsung Galaxy S4",
"description":"",
"price":600,
"_conflicts":["2-61ae00e029d4f5edd2981841243ded13"]
}

The version with the changed price has been chosen arbitrarily as the latest version of the document and the
conflict is noted in the _conflicts array. In most cases this array has only one element, but there can be many
conflicting revisions.
2. Merge the changes
Now your applications needs to compare the revisions to see what has been changed. To do that, it gets all the
version from the database with the following URLs:
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=261ae00e029d4f5edd2981841243ded13
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=17438df87b632b312c53a08361a7c3299
Since the two changes are for different fields of the document, it is easy to merge them automatically.
Depending on your application and the nature of the changes, other conflict resolution strategies might be useful.
Some common strategies are:
time based: first or last edit
reporting conflicts to users and letting them decide on the best resolution
more sophisticated merging algorithms, e.g. 3-way merges of text fields
3. Upload the new revision
We produce the following document and update the database with it.
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "3-daaecd7213301a1ad5493186d6916755",
"name": "Samsung Galaxy S4",
"description": "Latest smartphone from Samsung",
"price": 600
}

4. Delete old revisions


To delete the old revisions, we send a DELETE request to the URLs with the revisions we want to delete.

DELETE http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=2-61ae00e029d4f5

DELETE http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=2-f796915a291b37

After that, the document is not in conflict any more and you can verify that by getting the document again with
the conflicts parameter set to true.

126

Chapter 4. Guides

Documentation, Release 1.0.2

4.4 CouchApps and Tiers of Application Architecture


Because Cloudant can host raw file data, like images, and serve them over HTTP, then theoretically it can host all
the static files necessary to run a website, and host them just like a web server. Because these files would be hosted
on Cloudant, the client-side JavaScript could access Cloudant databases. An application built this way is said to
have a two-tier architecture, consisting of the client typically a browser and the database. In the CouchDB
community, this is called a CouchApp.
Most web apps have three tiers: the client, the server, and the database. Placing the server inbetween the client
and the database can help with authentication, authorization, asset management, leveraging third-party web APIs,
providing particularly sophisticated endpoints, etc. This separation allows for added complexity without conflating
concerns, so your client can worry first and last about data presentation, while your database can focus on storing
and serving data.
CouchApps shine in their simplicity, but frequently a web app will need the power of a 3-tier architecture. When
is each appropriate?

4.4.1 A CouchApp is appropriate if...


Your server would have only provided an API to Cloudant anyway.
Youre OK using Cloudants cookie-based authentication.
Youre OK using Cloudants _users and _security databases to manage users and permissions.
You dont need to schedule cronjobs or other regular tasks.
To get started with CouchApps, read Managing applications on Cloudant.

4.4.2 A 3-tier application is appropriate if...


You need finer-grained permissions than the _security database allows.
You need an authentication method other than Basic auth or cookie authentication, such as Oauth or a
3rd-party login system.
You need to schedule tasks outside the client to run regularly.
You can write your server layer using whatever technologies work best for you. We keep a list of libraries for
working with Cloudant here: Developing on Cloudant.

4.5 Replication
Replication is an incremental, one-way process involving two databases, a source and a destination. At the end of
the replication process, all latest revisions of documents in the source database are also in the destination database
and all documents that were deleted from the source database are also deleted (if necessary) from the destination
database.
The replication process only copies the latest revision of a document, so all previous revisions that were only on
the source database are not copied to the destination database.

4.5.1 Replication using the /_replicate API


Replication can be triggered by sending a POST request to the /_replicate URL. Many of the concepts and
parameters are similar, but you are encouraged to use the Replicator Database instead of the old API documented
here.
The body of the POST request is a JSON document with the following fields:

4.4. CouchApps and Tiers of Application Architecture

127

Documentation, Release 1.0.2

Field
Name
source

ReDescription
quired
yes
Identifies the database to copy revisions from. Can be a database URL, or an object
whose url property contains the full URL of the database.
target
yes
Identifies the database to copy revisions to. Same format and interpretation as source.
cancel
no
Include this property with a value of true to cancel an existing replication between
the specified source and target.
continuousno
A value of true makes the replication continuous (see below for details.)
create_target
no
A value of true tells the replicator to create the target database if it doesnt exist.
doc_ids
no
Array of document IDs; if given, only these documents will be replicated.
filter
no
Name of a filter function that can choose which revisions get replicated. cc
proxy
no
Proxy server URL.
query_params
no
Object containing properties that are passed to the filter function.
use_checkpoints
no
Whether to create checkpoints. Checkpoints greatly reduce the time and resources
needed for repeated replications. Setting this to false removes the requirement for
write access to the source database. Defaults to true.
The source and a target fields indicate the databases that documents will be copied from and to, respectively.
Unlike CouchDB, you have to use the full URL of the database.
POST /_replicate HTTP/1.1
{
"source": "http://username.cloudant.com/example-database",
"target": "http://example.org/example-database"
}

The target database has to exist and is not implicitly created. Add "create_target":true to the JSON
object to create the target database (remote or local) prior to replication. The names of the source and target
databases do not have to be the same.
Canceling replication
A replication triggered by POSTing to /_replicate/ can be canceled by POSTing the exact same JSON object
but with the additional cancel property set to true.
POST /_replicate HTTP/1.1
{
"source": "https://username:password@username.cloudant.com/example-database",
"target": "https://username:password@example.org/example-database",
"cancel": true
}

Notice: the request which initiated the replication will fail with error 500 (shutdown).
The replication ID can be obtained from the original replication request (if its a continuous replication) or from
/_active_tasks.
Example

First we start the replication.


$ curl -H Content-Type: application/json -X POST http://username.cloudant.com/_replicate \
-d {
"source": "https://username:password@example.com/foo",
"target": "https://username:password@username.cloudant.com/bar",
"create_target": true,
"continuous": true
}

128

Chapter 4. Guides

Documentation, Release 1.0.2

The reply contains an id.


{
"ok": true,
"_local_id": "0a81b645497e6270611ec3419767a584+continuous+create_target"
}

We use this id to cancel the replication.


$ curl -H Content-Type: application/json -X POST http://username.cloudant.com/_replicate \
-d {
"replication_id": "0a81b645497e6270611ec3419767a584+continuous+create_target",
"cancel": true
}

The "ok":

true reply indicates that the replication was successfully canceled.

{
"ok": true,
"_local_id": "0a81b645497e6270611ec3419767a584+continuous+create_target"
}

Continuous replication
To make replication continuous, add a "continuous":true parameter to the JSON, for example:
$ curl -H Content-Type: application/json -X POST http://username.cloudant.com/_replicate \
-d {
"source": "http://username:password@example.com/foo",
"target": "http://username:password@username.cloudant.com/bar",
"continuous": true
}

Replications can be persisted, so that they survive server restarts. For more, see Replicator Database.
Filtered Replication
Sometimes you dont want to transfer all documents from source to target. You can include one or more filter
functions in a design document on the source and then tell the replicator to use them.
A filter function takes two arguments (the document to be replicated and the the replication request) and returns
true or false. If the result is true, the document is replicated.
function(doc, req) {
return !!(doc.type && doc.type == "foo");
}

Filters live under the top-level filters key;


{
"_id": "_design/myddoc",
"filters": {
"myfilter": "function goes here"
}
}

Invoke them as follows:


{
"source": "http://username:password@example.org/example-database",
"target": "http://username:password@username.cloudant.com/example-database",
"filter": "myddoc/myfilter"
}

4.5. Replication

129

Documentation, Release 1.0.2

You can even pass arguments to them.


{
"source": "http://username:password@example.org/example-database",
"target": "http://username:password@username.cloudant.com/example-database",
"filter": "myddoc/myfilter",
"query_params": {
"key": "value"
}
}

Named Document Replication


Sometimes you only want to replicate some documents. For this simple case you do not need to write a filter
function. Simply add the list of keys in the doc_ids field.
{
"source": "http://username:password@example.org/example-database",
"target": "http://username:password@127.0.0.1:5984/example-database",
"doc_ids": ["foo", "bar", "baz]
}

Replicating through a proxy


Pass a proxy argument in the replication data to have replication go through an HTTP proxy:
POST /_replicate HTTP/1.1
{
"source": "http://username:password@username.cloudant.com/example-database",
"target": "http://username:password@example.org/example-database",
"proxy": "http://my-proxy.com:8888"
}

Authentication
The source and the target database may require authentication, and if checkpoints are used (on by default), even
the source will require write access. The easiest way to authenticate is to put a username and password into the
URL; the replicator will use these for HTTP Basic auth:
{
"source": "https://username:password@example.com/db",
"target": "https://username:password@username.cloudant.com/db"
}

Performance related options


These options can be set per replication by including them in the replication document.
worker_processes - The number of processes the replicator uses (per replication) to transfer documents from the source to the target database. Higher values can imply better throughput (due to more
parallelism of network and disk IO) at the expense of more memory and eventually CPU. Default value is
4.
worker_batch_size - Workers process batches with the size defined by this parameter (the size corresponds to number of _changes feed rows). Larger batch sizes can offer better performance, while lower
values imply that checkpointing is done more frequently. Default value is 500.

130

Chapter 4. Guides

Documentation, Release 1.0.2

http_connections - The maximum number of HTTP connections per replication. For push replications, the effective number of HTTP connections used is min(worker_processes + 1, http_connections). For
pull replications, the effective number of connections used corresponds to this parameters value. Default
value is 20.
connection_timeout - The maximum period of inactivity for a connection in milliseconds. If a connection is idle for this period of time, its current request will be retried. Default value is 30000 milliseconds
(30 seconds).
retries_per_request - The maximum number of retries per request. Before a retry, the replicator
will wait for a short period of time before repeating the request. This period of time doubles between each
consecutive retry attempt. This period of time never goes beyond 5 minutes and its minimum value (before
the first retry is attempted) is 0.25 seconds. The default value of this parameter is 10 attempts.
socket_options - A list of options to pass to the connection sockets. The available options can be found
in the documentation for the Erlang function setopts/2 of the inet module. Default value is [{keepalive,
true}, {nodelay, false}].
Example
POST /_replicate HTTP/1.1
{
"source": "https://username:password@example.com/example-database",
"target": "https://username:password@example.org/example-database",
"connection_timeout": 60000,
"retries_per_request": 20,
"http_connections": 30
}

4.5.2 Replicator Database


Introduction
The /_replicator database is a special database where you PUT/POST documents to trigger replications
and you DELETE to cancel ongoing replications. These documents have exactly the same content as the JSON
documents you can POST to /_replicate/. See Replication using the /_replicate API. Fields are source,
target, create_target, continuous, doc_ids, filter, query_params.
Replication documents can have a user defined _id. Design documents (and _local documents) added to the
replicator database are ignored.
Basics
Lets say you PUT the following document into _replicator:
{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",
"create_target": true
}

As soon as the replication is triggered, the document will be updated with 3 new fields:
{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",

4.5. Replication

131

Documentation, Release 1.0.2

"create_target": true,
"_replication_id": "c0ebe9256695ff083347cbf95f93e280",
"_replication_state": "triggered",
"_replication_state_time": "2011-06-07T16:54:35+01:00"
}

Note: special fields set by the replicator start with the prefix _replication_.
_replication_id: the ID internally assigned to the replication. This is the ID exposed by the output
from /_active_tasks/;
_replication_state: the current state of the replication;
_replication_state_time: an RFC3339 compliant timestamp that tells us when the current replication state (defined in _replication_state) was set.
When the replication finishes,
it will update the _replication_state field
_replication_state_time) with the value "completed", so the document will look like:

(and

{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",
"create_target": true,
"_replication_id": "c0ebe9256695ff083347cbf95f93e280",
"_replication_state": "completed",
"_replication_state_time": "2011-06-07T16:56:21+01:00"
}

When an error happens during replication, the _replication_state field is set to "error".
There are only 3 possible values for the _replication_state field: "triggered", "completed" and
"error". Continuous replications never get their state to "completed".
Canceling replications
To cancel a replication simply DELETE the document which triggered the replication. Note that if the replication
is in an error state, the replicator will try it again and again, updating the replication document and thereby
changing the revision. You thus need to get the revision immediately before deleting the document or you might
get a document update conflict response.
Example
$ curl -X DELETE http://username.cloudant.com/_replicator/replication1?rev=...

Note: You need to DELETE the document that triggered the replication. DELETEing another document that
describes the same replication but did not trigger it will not cancel the replication.
The user_ctx property and delegations
Replication documents can have a custom user_ctx property. This property defines the user context under
which a replication runs. For the old way of triggering replications (POSTing to /_replicate/), this property
was not needed (it didnt exist in fact) - this is because at the moment of triggering the replication it has information
about the authenticated user. With the replicator database, since its a regular database, the information about the
authenticated user is only present at the moment the replication document is written to the database - the replicator
database implementation is like a _changes feed consumer (with ?include_docs=true) that reacts to what
was written to the replicator database - in fact this feature could be implemented with an external script/program.
This implementation detail implies that for non admin users, a user_ctx property, containing the users name and
a subset of his/her roles, must be defined in the replication document. This is ensured by the document update
validation function present in the default design document of the replicator database. This validation function also
132

Chapter 4. Guides

Documentation, Release 1.0.2

ensure that a non admin user can set a user name property in the user_ctx property that doesnt match his/her
own name (same principle applies for the roles).
For admins, the user_ctx property is optional, and if its missing it defaults to a user context with name null
and an empty list of roles - this means design documents will not be written to local targets. If writing design
documents to local targets is desired, then a user context with the roles _admin must be set explicitly.
Also, for admins the user_ctx property can be used to trigger a replication on behalf of another user. This is
the user context that will be passed to local target database document validation functions.
Note: The user_ctx property only has effect for local endpoints.
Example delegated replication document:
{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",
"continuous": true,
"user_ctx": {
"name": "joe",
"roles": ["erlanger", "researcher"]
}
}

As stated before, for admins the user_ctx property is optional, while for regular (non admin) users its mandatory. When the roles property of user_ctx is missing, it defaults to the empty list [ ].
Monitoring progress
The active tasks API was enhanced to report additional information for replication tasks. Example:
$ curl http://username.cloudant.com/_active_tasks
[
{
"pid": "<0.1303.0>",
"replication_id": "e42a443f5d08375c8c7a1c3af60518fb+create_target",
"checkpointed_source_seq": 17333,
"continuous": false,
"doc_write_failures": 0,
"docs_read": 17833,
"docs_written": 17833,
"missing_revisions_found": 17833,
"progress": 3,
"revisions_checked": 17833,
"source": "http://username.cloudant.com/db/",
"source_seq": 551202,
"started_on": 1316229471,
"target": "test_db",
"type": "replication",
"updated_on": 1316230082
}
]

4.6 Back up your data


There are two kinds of people, those whove had a hard drive failure and those who havent had one yet. Luckily,
Cloudant already takes care of such failures by replicating all data across three nodes. So why would you need
a backup? Because there is more than one way to lose or be unable to access data: If a data center gets hit by a
tsunami, having three nodes in that data center wont help much. Also, if a faulty application deletes or overwrites
4.6. Back up your data

133

Documentation, Release 1.0.2

data in the database, no amount of duplication will prevent that. For the first scenario, you need a cluster that spans
multiple geographic locations, which we offer to customers on our dedicated pricing plan, or you can replicate
your data to a cluster (dedicated or multi-tenant) in a different geographic location. The second scenario is what
this guide is about. In the case of a faulty application, you need a backup that preserves the state of the database
at certain points in time.

4.6.1 How to back up


Unfortunately, there is no obvious, out-of-the-box solution to this problem. One way to go about it is to replicate
the database to a dated backup database. This certainly works and is easy to do, but if the database is big and
you need backups for multiple points in time (e.g. 7 daily backups and 4 weekly ones), you will end up with a
lot of disk usage, because you will be storing a complete copy in each new backup database. The solution to this
problem is to do incremental backups, storing only the documents that have changed since the last backup. After
an initial full backup you start the replication process to another database with a since_seq parameter, telling
it where the last replication left off.
1. You find the ID of the checkpoint document for the last replication. It is stored in the _replication_id
field of the replication document in the _replicator database.
2. You open the checkpoint document at /<database>/_local/<_replication_id>, where
<_replication_id> is the ID you found in the previous step and <database> is the name of the
source or the target database. The document usually exists on both databases, but might only exist on one.
3. You look for the recorded_seq field of the first element of the history array.
4. You start a replication to a new database and set the since_seq field in the replication document to the
value of the recorded_seq field from the previous step.

4.6.2 How to restore


To restore a database from backup, you replicate each incremental backup to a new database starting with the
latest increment. You dont have to do it in this order, but replicating from the latest incremental backup first will
be faster, because updated documents will only have to be written to the target once.

4.6.3 An example
Lets say you have one database to back up, and you want to create a full backup on Monday and an incremental
one on Tuesday. You can use curl and jq to do this, but of course any other http client will work.
You save your base URL and the content type in a variable, so that you dont have to enter it again and again for
each request.
$ url=https://<username>:<password>@<username>.cloudant.com
$ ct=Content-Type: application-json

You create three databases, one original and two for backups.
$ curl -X PUT "${url}/original"
$ curl -X PUT "${url}/backup-monday"
$ curl -X PUT "${url}/backup-tuesday"

You create the _replicator database, if it does not exist yet.


$ curl -X PUT "${url}/_replicator"

On Monday, you backup your data for the first time, so you replicate everything from original to
backup-monday.

134

Chapter 4. Guides

Documentation, Release 1.0.2

$ curl -X PUT "${url}/_replicator/backup-monday" -H "$ct" -d @- <<END


{
"_id": "backup-monday",
"source": "${url}/original",
"target": "${url}/backup-monday"
}
END

On Tuesday, things get more complicated. You first need to get the ID of the checkpoint document.
$ repl_id=$(curl "${url}/_replicator/backup-monday" | jq -r ._replication_id)

Once you have that, you use it to get the recorded_seq value.
$ recorded_seq=$(curl "${url}/original/_local/${repl_id}" | jq -r .history[0].recorded_seq)

And with the recorded_seq you can start the incremental backup for Tuesday.
$ curl -X PUT "${url}/_replicator/backup-tuesday" -H "${ct}" -d @- <<END
{
"_id": "backup-tuesday",
"source": "${url}/original",
"target": "${url}/backup-tuesday",
"since_seq": "${recorded_seq}"
}
END

To restore from the backup, you replicate the initial full backup and any number of incremental backups to a new
database.
If you want to restore mondays state, just replicate from the backup-monday database:
$ curl -X PUT "${url}/_replicator/restore-monday" -H "$ct" -d @- <<END
{
"_id": "restore-monday",
"source": "${url}/backup-monday",
"target": "${url}/restore",
"create-target": true
}
END

If you want to restore tuesdays state, first replicate from backup-tuesday and then from backup-monday.
Using this order, documents that were updated on tuesday will only have to be written to the target database once.
$ curl -X PUT "${url}/_replicator/restore-tuesday" -H "$ct" -d @- <<END
{
"_id": "restore-tuesday",
"source": "${url}/backup-tuesday",
"target": "${url}/restore",
"create-target": true
}
END
$ curl -X PUT "${url}/_replicator/restore-monday" -H "$ct" -d @- <<END
{
"_id": "restore-monday",
"source": "${url}/backup-monday",
"target": "${url}/restore"
}
END

4.6. Back up your data

135

Documentation, Release 1.0.2

4.6.4 Additional hints and suggestions


While the above outlines the basic procedure, each application will have its own requirements and thus its own
strategy for backups. Here are a few things you might want to keep in mind.
When to start backups
Replication jobs can significantly increase the load on a cluster. If you are backup up several databases, you might
want to start replication jobs at different times or at times when the cluster is usually less busy.
IO Priority
It is also possible to change the priority of backup jobs by setting the x-cloudant-io-priority field in the headers
object of the target and/or the source objects of the replication document to low. For example:
{
source: {
url: https://user:pass@example.com/db,
headers: {
x-cloudant-io-priority: low
}
},
target: {
url: https://user:pass@example.net/db,
headers: {
x-cloudant-io-priority: low
}
}
}

Design documents
If you back up design documents, indexes will be created on the backup destination. This slows down the backup
process and unnecessarily takes up disk space. So if you dont need indexes on the backup system, use a filter
function in all replications that filters out design documents. This can also be a good place to filter out other
documents that arent needed anymore.
Backing up many databases
If your application uses one database per user or allows each user to create several databases, backup jobs will
need to be created for each new database. Make sure that the replication jobs dont all start at the same time.

4.6.5 Need help?

4.7 How to monitor indexing and replication tasks


Creating new indexes over lots of data or replicating a large database can take quite a while. So how can you see
whether your tasks are making progress and when they will be completed? The _active_tasks endpoint provides
information about all ongoing tasks. However, if you start a lot of tasks, some of them might be scheduled to run
later and will not show up under _active_tasks until they have been started.
In this guide we will talk about how to use the _active_tasks endpoint to monitor long-running tasks. We will use
curl to access the endpoint and jq (a command-line JSON processor) to process the JSON response.
Since this is a task-focused tutorial, it will only cover what is needed to accomplish this task. Please refer to the
API documentation for a complete reference.
136

Chapter 4. Guides

Documentation, Release 1.0.2

4.7.1 curl and jq basics


To get all active tasks and format the output nicely, we call curl and pipe the output to jq:
curl https://username:password@username.cloudant.com/_active_tasks | jq .

jq lets you filter a list of documents by their field values, which makes it easy to get all replication documents or
just one particular view indexing task you are interested in. Have a look at the detailed manual to find out more!

4.7.2 How to monitor view builds and search indexes


View indexes are being rebuilt when a design document is updated. An update to just one of the views leads to
all the views in the document being rebuilt. However, search indexes are only rebuilt when their index function is
changed. For each search index that is being built and for each design document whose views are changed, one
task is created for each replica of each shard in a cluster. For example, if there are 24 shards with 3 replicas each
and you update 2 search indexes, 144 tasks will be run.
To find all view indexing tasks, you pipe the curl output to jq and let it filter the documents in the array by their
type field.

curl -s https://username:password@username.cloudant.com/_active_tasks | jq .[] | select(.type==

The same works for search indexing tasks.


curl ... | jq .[] | select(.type=="search_indexer")

The output will be a list of JSON objects like this one:


{
"total_changes": 6435,
"started_on": 1371118332,
"user": "username",
"updated_on": 1371118334,
"type": "indexer",
"node": "dbcore@db6.meritage.cloudant.net",
"pid": "<0.16366.6103>",
"changes_done": 364,
"database": "shards/40000000-7fffffff/username/database",
"design_document": "_design/ngrams"
}

To estimate the time needed until the indexing task is complete, you can monitor the number of changes_done and
compare this value to total_changes. For instance, if changes_done increases by 250 per second and total_changes
is 1,000,000, the task will take about 66 minutes to complete. However, this is only an estimate. How long the
process will really take depends on:
The time it takes to process each document. For instance, a view might check the type of a document first
and only emit new index entries for one type.
The size of the documents
The current workload on the cluster
These factors combined can lead to your estimate being off by as much as 100%
You can extract the changes_done field using jq like this:
curl ... | jq .[] | select(.type=="search_indexer") | .changes_done

4.7.3 How to monitor replication


To find all replication tasks, you pipe the curl output to jq and let it filter the documents in the array by their type
field.
4.7. How to monitor indexing and replication tasks

137

Documentation, Release 1.0.2

curl ... | jq .[] | select(.type=="replication")

We recommend that you start a replication process by creating a document in the _replicator database and setting
its _id field. That makes it easier to select the information about this process from the active tasks:
curl ... | jq .[] | select(.doc_id==ID)

Alternatively, you can select by replication_id:


curl ... | jq .[] | select(.replication_id==ID)

The output will look like this:


{
"started_on": 1371094220,
"source_seq": "62960-sakdjflksdfjsdlkafjalskdfjlsakfjlasdkjksald",
"source": "",
"revisions_checked": 12,
"continuous": true,
"doc_id": null,
"doc_write_failures": 0,
"docs_read": 12,
"target": "",
"type": "replication",
"updated_on": 1371118477,
"user": "username",
"checkpointed_source_seq": "61764-dskfjalsfjsalkfjssadjfhasdfkjhsdkfhsdkf",
"changes_pending": 1196,
"pid": "<0.9955.4120>",
"node": "dbcore@db7.meritage.cloudant.net",
"docs_written": 12,
"missing_revisions_found": 12,
"replication_id": "asfksdlfkjsadkfjsdalkfjas+continuous+create_target"
}

Is it stuck?
So what can you do with all this information? In the case on a one-off (i.e. non-continuous) replication
where the source database isnt updated a lot during the replication, the changes_pending value tells you
how many documents are still to be processed and is a good indicator of when the replication will be finished. In the case of a continuous replication, you will be more interested in how the number of documents
processed changes over time and whether changes_pending increases. If changes_pending increases and
revisions_checked stays constant for a while, the replication is probably stalled. If changes_pending
increases, but revisions_checked also increases, this might indicate that the replication cant keep up with
the volume of data added to or updated in the database.
What to do?
To resolve a stalled replication, it is sometimes necessary to cancel the replication process and start it again. If
that does not help, the replication might be stalled because the user accessing the source or target database does
not have write permissions. Note that replication makes use of checkpoints so that it doesnt have to repeat work
if it is triggered again. However, that means you need write permission on both the target and the source. If you
created the replication process by creating a document in the _replicator database, you can also check the status
of the replication there.

138

Chapter 4. Guides

Documentation, Release 1.0.2

4.8 Data that Moves: Switching Clusters


We host data all over the world so you can be close to your users wherever they are. Unless youre a dedicated
customer, your data is stored on one of eight clusters around the globe. You can change that cluster at your leisure
to be closer to your users. Heres how to switch clusters, in six seconds.
Or, if you prefer written directions, here they are:
Log into Cloudant and go to your dashboard.
Click Account.
Under Placement, youll see a dropdown of clusters and their locations.
Select one, and click Submit.
Thats it! One of our database elves will move your data shortly, no downtime involved.
If you want to know approximately where each multi-tenant cluster is, please enjoy this map of cluster locations.
As always, if you have any trouble, post your question to StackOverflow, ping us on IRC, or if youd like to
discuss the matter in private, email us at support@cloudant.com.

4.9 Transactions in Cloudant


Say youve got a shopping app. There are items, accounts, purchases, etc., and at the end of the day, the books
must balance. If a user purchases something, you must appropriately charge the account, and reflect that change
in inventory. If any of those steps fails, but others succeed, your system is left out of whack. If you were updating
documents to reflect these changes, the previous versions of those documents might be lost, requiring you to take
particular precaution in your app layer to handle failure cases, thus bloating your code. Is there an easier way to
achieve consistency?
Yes. As Sam Bisbee put it, Dont update documents. In the case of the shopping app, instead insert documents
like this:
{
"type": "purchase",
"item": "...",
"account": "...",
"quantity": 2,
"unit_price": 99.99
}
{
"type": "payment",
"account": "...",
"value": 199.98
}

item and account, then, are IDs for other objects in your database. To calculate a running total for an account,
we would use a view like this:
{
views: {
totals: {
map: function(doc){
if(doc.type === purchase){
emit(doc.account, doc.quantity * doc.unit_price);
}else{
if(doc.type === payment){
emit(doc.account, -doc.value);
}
}

4.8. Data that Moves: Switching Clusters

139

Documentation, Release 1.0.2

},
reduce: _sum
}
}
}

Voila! Now calling this view with the group=true&key={account} options will give us a running balance
for a particular account. If you need to roll back a purchase or payment, just insert a document with values to
balance out the interaction you want to negate.
This practice of logging events, and aggregating them to determine an objects state, is called event sourcing. Used
well, it provides SQL-like transactional atomicity even in a NoSQL database like Cloudant.

4.9.1 Event Sourcing


SQL databases often have transactional semantics that allow you to commit changes in an all-or-nothing fashion:
if any of the changes fail, the database rejects the whole package. Papers like ARIESS lay out how this works,
and how to implement it to ensure ACID transactions. Although Cloudant lacks these semantics directly, you can
use a strategy called Event Sourcing to get dang close.
Event sourcing, in a nutshell, is the strategy we outlined above: reflect changes through document insertions rather
than updates, then use secondary indexes to reflect overall application state.
In event sourcing, the databases atomic unit is the document. If a document fails to write, it should never leave
the database in an inconsistent state. So, we break documents into interactions: rather than updating an account
document with its current balance, we calculate it and other dynamic values by aggregating the interactions the
account was involved in. As much as possible, represent objects as the sum of their interactions.

4.9.2 Using _uuids to group transactions


Say in our shopping app, you have a shopping cart where users can hold items before purchasing, and which they
ultimately purchase as a group. How can we group these purchases, while maintaining each purchase as a single
document? Use the _uuids endpoint!
https://{user}.cloudant.com/_uuids returns an array of unique IDs which have an approximately
negligible chance of overlapping with your document IDs. By default, it returns one ID, but you can set count in
your querystring to get more. For example, calling _uuids?count=3 yields this:
{
"uuids": [
"320afa89017426b994162ab004ce3383",
"320afa89017426b994162ab004ce3b09",
"320afa89017426b994162ab004ce4083"
]
}

This way, when the user purchases everything in their cart, you can use _uuids to generate a shared transaction_id that allows you to retrieve them as a group later. For that, we might use a view like this:
{
views: {
transactions: {
map: function(doc){
if(doc.type === purchase){
emit(doc.transaction_id, null);
}
}
}
}
}

140

Chapter 4. Guides

Documentation, Release 1.0.2

We can then use queries like _view/transactions?key={transaction_id}&include_docs=true


to retrieve every change associated with a transaction.

4.9.3 Using dbcopy to map data into events


Say your database consists of data that simply doesnt lend itself to event sourcing. Perhaps you uploaded documents that have rows of events in them, and youd like to migrate your data to better accommodate an event
sourcing strategy. To address this, we can use dbcopy to map our current data into events, and then output them
to another database.
Say youve got documents like this:
{
account_id: ...,
balance: ...,
transaction_history: [{
date: ...,
item: ...,
quantity: ...,
unit_price: 100
},{
date: ...,
transaction_id: ...,
destination_account: ...,
change: 50
}]
}

To map that into another database as a series of transaction events, try this:
{
views: {
events: {
map: function(doc){
for(var i in doc.transaction_history){
var transaction = doc.transaction_history[i];
emit({
from: doc.account_id,
to: transaction.destination_account,
transaction_id: transaction.transaction_id,
date: transaction.date
}, transaction.change);
}
},
dbcopy: events
}
}
}

This will output the results of the map function into the events database, filling it with documents like this:
{
key: {
from: ...,
to: ...,
transaction_id: ...,
date: ...
},
value: 100
}

And lo, from barren earth we have made a garden. Nifty, eh?

4.9. Transactions in Cloudant

141

Documentation, Release 1.0.2

4.9.4 Summary
Although Cloudants eventual consistency model makes satisfying ACIDs consistency requirement difficult, you
can satisfy the rest of the requirements through how you structure your data. For event sourcing, regard these
guidelines:
The atomic unit is the document. The database should never find itself in an inconsistent state because a
document failed to write.
Use secondary indexes, not documents, to reflect overall application state.
If youve got unruly data, use dbcopy to map it into a friendly way and output it to another database.
If you have any trouble with any of this, post your question on StackOverflow, hit us up on IRC, or if youd like
to speak more privately, send us a note at support@cloudant.com

142

Chapter 4. Guides

You might also like