You are on page 1of 4

Multinode Cluster Installation Mode

Apache Hadoop v2.7.1


Linux Operating System (Ubuntu 12.04)

Environment Setup:
No.of Nodes = 4 (1 Namenode, 3 Datanodes)
Hostnames:
Namenode – namenode
Datanodes – datanode1, datanode2, datanode3

Installation and Configuration:


In “namenode”
1. Create a new user “multinode” for this installation procedure.
~$ sudo adduser multinode

2. Edit the “/etc/hosts” file providing the IP addresses of the cluster nodes.
~$ sudo vim /etc/hosts
namenode- ip-address namenode
datanode1-ip-address datanode1
datanode2-ip-address datanode2
datanode3-ip-address datanode3

Comment the line containing “localhost”


After making the above mentioned changes, save and close the file
Note: Also make sure, all the four nodes are reachable via network

3. Switch to the newly created user account


~$ su – multinode

4. Download the latest stable version of Apache Hadoop tarball distribution.

5. Download Java1.7 JDK tarball. Consider the architecture 32 bit (i386, i586,
i686), 64bit (x86_64) before downloading.

6. Assuming that the downloaded tarballs are present under the home directory of
the user. Extract the tarballs

~$ tar -xvf hadoop-2.7.1.tar.gz


~$ tar -xvf jdk-7u79-linux-x86_64.gz

7. After extracting, set up the environment variables in ~/.bashrc file


~$ vi .bashrc
export JAVA_HOME=/home/multinode/jdk1.7.0_79
export HADOOP_PREFIX=/home/multinode/hadoop-2.7.1
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:
$HADOOP_HOME/sbin:$PATH
After appending these lines, save and close the file.

8. For these variables to be set for the current shell, source the file.
~$ source ~/.bashrc
Check whether the changes have been applied properly
~$ echo $JAVA_HOME
~$ hadoop version

9. Next, edit the hadoop configuration files


~$ cd $HADOOP_CONF_DIR

~hadoop-2.7.1/etc/hadoop$ vi hadoop-env.sh
export JAVA_HOME=/home/multinode/jdk1.7.0_79

~hadoop-2.7.1/etc/hadoop$ vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:8020</value>
</property>
</configuration>

~hadoop-2.7.1/etc/hadoop$ vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/multinode/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/multinode/data</value>
</property>
<property>
<name>dfs.namenode.http.address</name>
<value>namenode:50070</value>
</property>
</configuration>
~hadoop-2.7.1/etc/hadoop$ vi yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

~hadoop-2.7.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml

~hadoop-2.7.1/etc/hadoop$ vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Note: These configurations are designed to work for the scenario of running
Namenode, ResourceManager and JobHistoryServer daemons in the “namenode”.

~hadoop-2.7.1/etc/hadoop$ vi slaves
datanode1
datanode2
datanode3

In “datanodes”:
Repeat the steps from 1 to 9 on all datanodes.

10. To enable password less login from namenode to all datanodes through SSH
In “namenode”:
~$ ssh-keygen
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub namenode
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode1
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode2
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode3

This procedure avoids prompting for password, when starting the daemons.

11. Format the namenode before starting the daemons.


~$ hadoop namenode -format

This formats the dfs.namenode.name.dir location and creates the necessary files
and folders required for namenode.

Note: Steps 10 and 11 are one-time procedures.

12. Start the cluster


~$ start-dfs.sh
~$ start-yarn.sh
~$ mr-jobhistory-daemon start historyserver
Alternatively, to start all the daemons
~$ start-all.sh
~$ mr-jobhistory-daemon start historyserver

13. To check for the daemons, use jps (java process status)
~$ jps

14. To Stop the cluster


~$ stop-yarn.sh
~$ stop-dfs.sh
~$ mr-jobhistory-daemon stop historyserver
To stop all the daemons in one go
~$ stop-all.sh
~$ mr-jobhistory-daemon stop historyserver

Note: To stop or start daemons individually.


~$ hadoop-demon.sh <start | stop> <namenode | datanode>
~$ yarn-daemon.sh <start | stop> <resourcemanager | nodemanager>

To stop or start all datanodes


~$ hadoop-daemons.sh <start | stop> datanode

To stop or start all nodemanagers


~$ yarn-daemon.sh <start | stop> nodemanager

You might also like