Hadoop Multinode Cluster Installation

Multinode Cluster Installation Mode
Apache Hadoop v2.7.1

Linux Operating System (Ubuntu 12.04)
Environment Setup:
No.of Nodes = 4 (1 Namenode, 3 Datanodes)
Hostnames:
Namenode – namenode
Datanodes – datanode1, datanode2, datanode3
Installation and Configuration:

In “namenode”
1. Create a new user “multinode” for this installation procedure.
~$ sudo adduser multinode
2. Edit the “/etc/hosts” file providing the IP addresses of the cluster nodes.
~$ sudo vim /etc/hosts
namenode- ip-address namenode
datanode1-ip-address datanode1
Comment the line containing “localhost”

After making the above mentioned changes, save and close the file
Note: Also make sure, all the four nodes are reachable via network
3. Switch to the newly created user account

~$ su – multinode
4. Download the latest stable version of Apache Hadoop tarball distribution.
5. Download Java1.7 JDK tarball. Consider the architecture 32 bit (i386, i586,
i686), 64bit (x86_64) before downloading.
6. Assuming that the downloaded tarballs are present under the home directory of
the user. Extract the tarballs
~$ tar -xvf hadoop-2.7.1.tar.gz

~$ tar -xvf jdk-7u79-linux-x86_64.gz
7. After extracting, set up the environment variables in ~/.bashrc file

~$ vi .bashrc
export JAVA_HOME=/home/multinode/jdk1.7.0_79
export HADOOP_PREFIX=/home/multinode/hadoop-2.7.1
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:
$HADOOP_HOME/sbin:$PATH
After appending these lines, save and close the file.
8. For these variables to be set for the current shell, source the file.
~$ source ~/.bashrc
Check whether the changes have been applied properly
~$ echo $JAVA_HOME
~$ hadoop version
9. Next, edit the hadoop configuration files

~$ cd $HADOOP_CONF_DIR
~hadoop-2.7.1/etc/hadoop$ vi hadoop-env.sh
export JAVA_HOME=/home/multinode/jdk1.7.0_79
~hadoop-2.7.1/etc/hadoop$ vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:8020</value>
</property>
</configuration>
~hadoop-2.7.1/etc/hadoop$ vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/multinode/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/multinode/data</value>
</property>
<property>
<name>dfs.namenode.http.address</name>
<value>namenode:50070</value>
</property>
</configuration>
~hadoop-2.7.1/etc/hadoop$ vi yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
~hadoop-2.7.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml
~hadoop-2.7.1/etc/hadoop$ vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Note: These configurations are designed to work for the scenario of running
Namenode, ResourceManager and JobHistoryServer daemons in the “namenode”.
~hadoop-2.7.1/etc/hadoop$ vi slaves
datanode1
datanode2
datanode3
In “datanodes”:
Repeat the steps from 1 to 9 on all datanodes.
10. To enable password less login from namenode to all datanodes through SSH
In “namenode”:
~$ ssh-keygen
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub namenode
~$ ssh-copy-id -i ~/.ssh/id_rsa.pub datanode1
This procedure avoids prompting for password, when starting the daemons.
11. Format the namenode before starting the daemons.

~$ hadoop namenode -format
This formats the dfs.namenode.name.dir location and creates the necessary files
and folders required for namenode.
Note: Steps 10 and 11 are one-time procedures.
12. Start the cluster

~$ start-dfs.sh
~$ start-yarn.sh
~$ mr-jobhistory-daemon start historyserver
Alternatively, to start all the daemons
~$ start-all.sh
~$ mr-jobhistory-daemon start historyserver
13. To check for the daemons, use jps (java process status)
~$ jps
14. To Stop the cluster

~$ stop-yarn.sh
~$ stop-dfs.sh
~$ mr-jobhistory-daemon stop historyserver
To stop all the daemons in one go
~$ stop-all.sh
~$ mr-jobhistory-daemon stop historyserver
Note: To stop or start daemons individually.

~$ hadoop-demon.sh <start | stop> <namenode | datanode>
~$ yarn-daemon.sh <start | stop> <resourcemanager | nodemanager>
To stop or start all datanodes

~$ hadoop-daemons.sh <start | stop> datanode
To stop or start all nodemanagers

~$ yarn-daemon.sh <start | stop> nodemanager

Hadoop Multinode Cluster Installation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Multinode Cluster Installation

Uploaded by

Copyright:

Available Formats

Multinode Cluster Installation Mode

Apache Hadoop v2.7.1

Installation and Configuration:

Comment the line containing “localhost”

3. Switch to the newly created user account

4. Download the latest stable version of Apache Hadoop tarball distribution.

~$ tar -xvf hadoop-2.7.1.tar.gz

7. After extracting, set up the environment variables in ~/.bashrc file

9. Next, edit the hadoop configuration files

~hadoop-2.7.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml

11. Format the namenode before starting the daemons.

Note: Steps 10 and 11 are one-time procedures.

12. Start the cluster

14. To Stop the cluster

Note: To stop or start daemons individually.

To stop or start all datanodes

To stop or start all nodemanagers

You might also like