You are on page 1of 5

Hadoop Configruation to use EMC Isilon Storage

This article is basically for those who already knows what is hadoop and why it is used.
If you are not familiar with hadoop, please skip this post. We will be having another
article to give basic understanding of the hadoop.
Hadoop can be deployed in different ways due to its flexible, open-source framework for
large-scale distributed computation. I would like share my recent deployment of
Hadoop on Isilon scale-out NAS.
To give a very high level introduction to EMC Isilon scale-out NAS storage platform, it
combines modular hardware with unified software to harness unstructured data,
powered by the distributed OneFS operating system, an EMC Isilon cluster delivers a
scalable pool of storage with a global namespace.
The OneFS file system can be configured for native support of the Hadoop Distributed
File System (HDFS) protocol, enabling your cluster to participate in a Hadoop system.
The HDFS service, which is enabled by default after you activate an HDFS license, can
be enabled or disabled by running the isi services command.


To enable the HDFS service, run the following command:
isi services isi_hdfs_d enableo
An HDFS implementation adds HDFS to the list of protocols that can be used to access
the OneFS file system. Implementing HDFS on an Isilon cluster does not create a
separate HDFS file system. The cluster can continue to be accessed through NFS, SMB,
FTP, and HTTP.
The HDFS implementation from Isilon is a lightweight protocol layer between the
OneFS file system and HDFS clients. Unlike with a traditional HDFS implementation,
files are stored in the standard POSIX-compatible file system on an Isilon cluster. This
means files can be accessed by the standard protocols that OneFS supports, such as
NFS, SMB, FTP, and HTTP as well as HDFS.
Files that will be processed by Hadoop can be loaded by using standard Hadoop
methods, such as hadoop fs -put, or they can be copied by using an NFS or SMB mount
and accessed by HDFS as though they were loaded by Hadoop methods. Also, files
loaded by Hadoop methods can be read with an NFS or SMB mount.
The supported versions of Hadoop are as follows:
Apache Hadoop 0.20.203.0
Apache Hadoop 0.20.205
Cloudera (CDH3 Update 3)
Greenplum HD 1.1
To enable native HDFS support in OneFS, you must integrate the Isilon cluster with a
cluster of Hadoop compute nodes. This process requires configuration of the Isilon
cluster as well as each Hadoop compute node that needs access to the cluster.
Create a local user:
To access files on OneFS by using the HDFS protocol, you must first create a local
Hadoop user that maps to a user on a Hadoop client.
1. Open a SSH connection to any node in the cluster and log in by using the root
user account.
2. At the command prompt, run the isi auth users create command to create a local
user.
isi auth users create name=user1
Configure the HDFS protocol
You can specify which HDFS distribution to use, and you can set the logging level, the
root path, the Hadoop block size, and the number of available worker threads. You
configure HDFS by running the isi hdfs command in the OneFS command-line
interface.
1. Open a SSH connection to any node in the cluster and log in by using the root
account.
2. To specify which distribution of the HDFS protocol to use, run the isi hdfs
command with the force-version option.
AUTO: Attempts to match the distribution that is being used by the Hadoop
compute node.
APACHE_0_20_203: Uses the Apache Hadoop 0.20.203 release.
APACHE_0_20_205: Uses the Apache Hadoop 0.20.205 release.
CLOUDERA_CDH3: Uses version 3 of Clouderas distribution, which includes
Apache Hadoop.
GREENPLUM_HD_1_1: Uses the Greenplum HD 1.1 distribution.
For example, the following command forces OneFS to use version 0.20.203 of the
Apache Hadoop distribution:
isi hdfs force-version=APACHE_0_20_203
3. To set the default logging level for the Hadoop daemon across the cluster, run the isi
hdfs command with the log-level option.
EMERG: A panic condition. This is normally broadcast to all users.
ALERT: A condition that should be corrected immediately, such as a corrupted
system database.
CRIT: Critical conditions, such as hard device errors.
ERR: Errors.
WARNING: Warning messages.
NOTICE: Conditions that are not error conditions, but may need special
handling.
INFO: Informational messages.
DEBUG: Messages that contain information typically of use only when debugging
a program.
For example, the following command sets the log level to WARNING:
isi hdfs log-level=WARNING
4. To set the path on the cluster to present as the HDFS root directory, run the isi hdfs
command with the root-path option.
For example, the following command sets the root path to /ifs/hadoop:
isi hdfs root-path=/ifs/hadoop
5. To set the Hadoop block size, run the isi hdfs command with the block-size option.
Valid values are 4KB to 1GB. The default value is 64MB.
For example, the following command sets the block size to 32 MB:
isi hdfs block-size=32MB
6. To tune the number of worker threads that HDFS uses, run the isi hdfs command
with the num-threads option.
Valid values are 1 to 256 or auto, which is calculated as twice the number of cores. The
default value is auto.
For example, the following command specifies 8 worker threads:
isi hdfs num-threads=8
7. To allocate IP addresses from an IP address pool, run isi hdfs with the add-ip-pool
option.
For example, the following command allocates IP addresses from a pool named pool2,
which is in the subnet0 subnet:
isi hdfs add-ip-pool=subnet0:pool2
HDFS commands that can be used on Isilon OneFS:
Manages rack-local configuration
isi hdfs racks
Displays an HDFS rack object
isi hdfs racks view name
Modifies an HDFS rack object
isi hdfs racks modify
Lists the exisiting HDFS racks
isi hdfs racks list
Deletes an HDFS rack
isi hdfs racks delete name
Creates the HDFS rack
isi hdfs racks create name

You might also like