Distributed Databases - YCSB++ Tutorial

YCSB++ Tutorial
Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky

YCSB++ Tutorial
Introduction:
YCSB++ benchmark tool extend YCSB to support Accumulo database, and read after write
measurement to test inconsistency between the different nodes of the database.
We will cover the following subjects regarding YCSB++ benchmark tool:
Installation and configuration of YCSB++.
Example of usage: measure inconsistent state in Cassandra (Acl) using YCSB++.
Example of usage: benchmark Accumulo using YCSB++.

Installation and configuration of YCSB++:

1. Tool chain requirements for YCSB++ are: Java (1.6 and higher) HBase, Hadoop,
Zookeeper and ant.
In this tutorial we use Java version 1.6.0, Hadoop version 0.20.1, HBase version 0.90.2,
and Zookeeper version 3.3.3
2. Download ant from here: http://ant.apache.org/ivy/download.cgi.
3. Install ant on your machine, you may find more information here:
http://ant.apache.org/manual/index.html.
4. Download YCSB++ files from here: https://github.com/MiloPolte/YCSB/zipball/master
5. Extract YCSB++ files. e.g. to /specific/disk1/temp/YCSB++/.
6. Download Zookeeper from http://zookeeper.apache.org/.
7. Go to the conf folder in the zookeeper folder and create a file called zoo.cfg
Insert the following lines inside:
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
# change the var instance to the place you would like zookeeper data file to be placed
# e.g dataDir=/specific/disk1/temp/zookeeper/conf/zookeeper
Save the file and close it.
8. Copy the zookeeper .jar file from zookeeper directory to /specific/disk1/temp/YCSB++/lib.
9. Enter "ant" command form the YCSB++ directory to build the package.

YCSB++ Tutorial

10. Download Hadoop from http://hadoop.apache.org/ and HBase from
http://hbase.apache.org/.
11. Build hbase database layer:
Copy the hbase-0.90.2.jar file from hbase directory to
/specific/disk1/temp/YCSB++/db/hbase/lib/.
Copy all the jar files from hbase lib directory to:
/specific/disk1/temp/YCSB++/db/hbase/lib/.
Go to YCSB++ directory and enter "ant dbcompile-hbase" command.

12. Build hbase bulkloader:
Copy the hbase-0.90.2.jar file from hbase directory to
/specific/disk1/temp/YCSB++/bulkloader/hbase/lib/.
Copy all the jar files from hbase lib directory to:
Copy the hadoop-0.20.1-core.jar file from Hadoop directory to:
Copy all the jar files from Hadoop lib directory to:
Go to YCSB++ directory and enter "ant bulkcompile-hbase".

13. In this example we would later show how to benchmark Accumulo database therefore we
have installed Accumulo and its perquisites Hadoop and Zookeeper on the system.
If you would like to download and install Accumulo you may find it here:
http://accumulo.apache.org/downloads/.
Once you obtained Accumulo on your machine follow the readme file located in the
Accumulo directory to bring it up.
14. Copy all the jar files from Accumulo lib directory to:
/specific/disk1/temp/YCSB++/db/accumulo/lib/.
15. Use "ant dbcompile-accumulo" command form the YCSB++ directory to build the
Accumulo database layer.

YCSB++ Tutorial

Example of usage: measure inconsistent state in Cassandra (Acl) nodes

YCSB++ uses processes named consumer and producer and syncs them via Zookeeper in
order to perform a consistency test among the nodes.
The producer process produces values and inserts them to one node in the database
(in our example to Cassandra Acl). Once values are inserted it notifies zookeeper which
signals the consumer to start querying another node for the information.
The time that passed from the moment the value was inserted by the producer to one
node until the time the value was reachable by the consumer from another node is the
inconsistency windows.

1. First bring Zookeeper up.
Go to zookeeper directory and run the server with the command:
"bin/zkServer.sh start". (You may stop it anytime you want using bin/zkServer.sh stop)
2. Next bring the cluster you would like to examine up. Make sure all of the nodes are
running correctly (notice that since we are trying to measure inconsistency and the
producer and consumer working with different node 2 nodes at least are needed in the
cluster).
In our example we run 3 nodes of Cassandra Acl.
3. Create a keyspace usertable (this specific keyspace is needed in YCSB) with replication
factor:3 in cassandra so there will be a copy of each value on each node:
"create keyspace usertable with replication_factor = 3
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';"
4. Next we create column family data.
"Create column family data;"
5. Once zookeeper and cassandra nodes are running place the consumer on hold:
"java -cp /specific/scratches/parallel/yosibar1-2012-10-31/YCSB++/lib/zookeeper-
3.4.3.jar:/specific/scratches/parallel/yosibar1-2012-10-
31/YCSB++/build/ycsb.jar:/specific/scratches/parallel/yosibar1-2012-10-
31/YCSB++/db/cassandra/cassandra-binding-0.1.4.jar com.yahoo.ycsb.Client -db
com.yahoo.ycsb.db.CassandraClient10 -p hosts=172.17.136.200 -p coord-
server=132.67.104.224:2181 -s -P ~/scratch/YCSB++/workloads/consumerWorkload -p
cassandra.username=ilia -p coord-server-zkRoot=/ycsb110"

Notice that:
-p coord-server=132.67.104.224:2181- is the ip of the Zookeeper server to listen on.
-p coord-server-zkRoot=/ycsb110 - is the entry for the Zookeeper to store information.
-p cassandra.username=ilia is the username for Cassandra Acl.
-p hosts=172.17.136.200 is the ip of Cassandra node which the consumer queries.

This should make the consumer prompt a going to wait message.

YCSB++ Tutorial

6. Finally we can run the producer to start the benchmark:
java -cp /specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/zookeeper-
3.4.3.jar:/specific/scratches/parallel/yosibar1-2012-10-
31/YCSB++/build/ycsb.jar:/specific/scratches/parallel/yosibar1-2012-10-
31/YCSB/++db/cassandra/cassandra-binding-0.1.4.jar com.yahoo.ycsb.Client -db
com.yahoo.ycsb.db.CassandraClient10 -p hosts=fermat-11 -p coord-server=132.67.104.224:2181 -
p operationcount=100000 -s -P ~/scratch/YCSB++/workloads/producerWorkload -p
cassandra.acl=yosi,dan,ilia -p coord-server-zkRoot=/ycsb110

Notice that:
-p coord-server=132.67.104.224:2181- is the ip of the Zookeeper server to listen on.
-p coord-server-zkRoot=/ycsb110 - is the entry for the Zookeeper to store information.
-p hosts=fermat-11 the ip of Cassandra node which the producer inserts values to.
-p operationcount=100000 is the number of operations to be executed.
-p cassandra.acl=yosi,dan,ilia is the Acl to be stored on the values inserted.

You may see the results of the time lags measured in the consumer process as they
represent inconsistent state of the keys and values between the nodes.

YCSB++ Tutorial

Examples of usage: benchmark Accumulo using YCSB++

1. First we'll bring Accumulo server up and prompt the client shell.
Once zookeeper and Hadoop are running correctly on the machine you may start
Accumulo server: "bin/start-all.sh" (enter the command from the Accumulo directory)

Run Zookeeper:

Run Hadoop:

Run Accumulo:

YCSB++ Tutorial

2. You may check that Accumulo runs correctly through the monitor page:
http://localhost:50095
This should look like:

3. Use the following command to prompt the client shell:
bin/accumulo shell -u root
Then enter the password for your Accumulo instance in our example we set the
instance name and password to accum/accum.
4. Next we'll create a new table called usertable: "createtable usertable"

YCSB++ Tutorial

5. Now we are ready to benchmark Accumulo using YCSB++.
In this example we'll use workloada form the YCSB++ core workloads which is a
50/50 workload of reads and inserts from the database.
Before we start make sure the workload file contains the needed property values:

Notice that:
accumulo.zookeper is the ip which zookeeper runs on.
accumulo.instanceName is the instance name you choose on the Accumulo init.
accumulo.password is the password you choose on the Accumulo init.
accumulo.columnFamily is the name of the table we created.

First let's use the load command to prepare the workload as values to be read are
inserted to Accumulo database. Afterwards we'll use the run command to perform
the benchmark test of workloada.
Enter the following command in the command prompt (or terminal) from
YCSB++ folder location:
`java -cp build/ycsb.jar:db/accumulo/lib/accumulo-core-1.4.0.jar:db/accumulo/lib/accumulo-core-
1.4.0-javadoc.jar:db/accumulo/lib/accumulo-core-1.4.0sources.jar:db/accumulo/lib/accumulo-server-
1.4.0.jar:db/accumulo/lib/accumulo-server-1.4.0-javadoc.jar:db/accumulo/lib/accumulo-server-
1.4.0sources.jar:db/accumulo/lib/accumulo-start-1.4.0.jar:db/accumulo/lib/accumulo-start-1.4.0-
javadoc.jar:db/accumulo/lib/accumulo-start-1.4.0sources.jar:db/accumulo/lib/zookeeper-
3.4.3.jar:db/accumulo/lib/hadoop-0.20.2-core.jar:db/accumulo/lib/cloudtrace-
1.4.0.jar:db/accumulo/libcloudtrace-1.4.0-javadoc.jar:db/accumulo/lib/cloudtrace-1.4.0-
sources.jar:db/accumulo/lib/commons-collections-3.2.jar:db/accumulo/lib/commonsconfiguration-
1.5.jar:db/accumulo/lib/commons-io-1.4.jar:db/accumulo/lib/commons-jci-core-
1.0.jar:db/accumulo/lib/commons-jci-fam-1.0.jar:dbaccumulo/lib/commons-lang-
2.4.jar:db/accumulo/lib/commons-logging-1.0.4.jar:db/accumulo/lib/commons-logging-api-
1.0.4.jar:db/accumulo/libexamples-simple-1.4.0.jar:db/accumulo/lib/examples-simple-1.4.0-
javadoc.jar:db/accumulo/lib/examples-simple-1.4.0-sources.jar:db/accumulo/libjline-
0.9.94.jar:db/accumulo/lib/libthrift-0.6.1.jar:db/accumulo/lib/log4j-
1.2.16.jar:db/accumulo/lib/wikisearch-ingest-1.4.0-javadoc.jar:db/accumulo/lib/wikisearch-query-
1.4.0-javadoc.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-log4j12-
1.6.1.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-api-1.6.1.jar
com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.AccumuloClientSecurity -p security.cell.entries=4 -p
host132.67.105.169 -p threadcount=1 -s -P /specific/scratches/parallel/yosibar1-2012-10-
31/workloads/workloada -load >> workloada_res.txt`

Notice that we attached all the lib jar files of Accumulo,ycsb and zookeeper to the
command.

hosts="132.67.105.169" refers to the ip Accumulo listen on.
threadcount=1 refers to the number of threads initiated in the test.
-P workloads/workloada refer to the workload being used.

YCSB++ Tutorial

Next we need to run the workload using YCSB++ run command:
`java -cp build/ycsb.jar:db/accumulo/lib/accumulo-core-1.4.0.jar:db/accumulo/lib/accumulo-core-
1.4.0-javadoc.jar:db/accumulo/lib/accumulo-core-1.4.0sources.jar:db/accumulo/lib/accumulo-server-
1.4.0.jar:db/accumulo/lib/accumulo-server-1.4.0-javadoc.jar:db/accumulo/lib/accumulo-server-
1.4.0sources.jar:db/accumulo/lib/accumulo-start-1.4.0.jar:db/accumulo/lib/accumulo-start-1.4.0-
javadoc.jar:db/accumulo/lib/accumulo-start-1.4.0sources.jar:db/accumulo/lib/zookeeper-
3.4.3.jar:db/accumulo/lib/hadoop-0.20.2-core.jar:db/accumulo/lib/cloudtrace-
1.4.0.jar:db/accumulo/libcloudtrace-1.4.0-javadoc.jar:db/accumulo/lib/cloudtrace-1.4.0-
sources.jar:db/accumulo/lib/commons-collections-3.2.jar:db/accumulo/lib/commonsconfiguration-
1.5.jar:db/accumulo/lib/commons-io-1.4.jar:db/accumulo/lib/commons-jci-core-
1.0.jar:db/accumulo/lib/commons-jci-fam-1.0.jar:dbaccumulo/lib/commons-lang-
2.4.jar:db/accumulo/lib/commons-logging-1.0.4.jar:db/accumulo/lib/commons-logging-api-
1.0.4.jar:db/accumulo/libexamples-simple-1.4.0.jar:db/accumulo/lib/examples-simple-1.4.0-
javadoc.jar:db/accumulo/lib/examples-simple-1.4.0-sources.jar:db/accumulo/libjline-
0.9.94.jar:db/accumulo/lib/libthrift-0.6.1.jar:db/accumulo/lib/log4j-
1.2.16.jar:db/accumulo/lib/wikisearch-ingest-1.4.0-javadoc.jar:db/accumulo/lib/wikisearch-query-
1.4.0-javadoc.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-log4j12-
1.6.1.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-api-1.6.1.jar
com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.AccumuloClientSecurity -p security.cell.entries=4 -p
host132.67.105.169 -p threadcount=1 -s -P /specific/scratches/parallel/yosibar1-2012-10-
31/workloads/workloada | grep "Throughout" >> workloada_res.txt`

This should create workloada_res.txt contains the information from the benchmark test.
In our example we used these tests to check Accumulo throughput as we increased
the number of entries in the access control list.
Therefore the file contains the information we gathered regarding the throughput
and the ACLs:

YCSB++ Tutorial

However if you exclude the "|grep "throuput" " from the command line, the benchmark
results will appear in terms of throughput, latency and run time.
You may change the operations count to 50000 by editing the workloada file or by adding:
-p operationcount=50000 to the command line.
Likewise you may change the number of threads for YCSB++ to initiate in the benchmark by
adding:
-p threadcount=100.
Finally you may add any other property parameters to your workload by changing the
YCSB++ source code using the getproperty mechanism (you may check the java files and
Javadoc for more information) after you insert your changes to the code, build the source
code again using the "ant" command from the YCSB++ directory and add the relevant
parameter using p key=value to the YCSB++ command.

Distributed Databases - YCSB++ Tutorial

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Databases - YCSB++ Tutorial

Uploaded by

Copyright:

Available Formats

YCSB++ Tutorial

You might also like