Professional Documents
Culture Documents
Sun Java 6
3. Install sun-java6-jdk
$ sudo apt-get install sun-java6-jdk
This will add the user hduser and the group hadoop to your local machine.
Configuring SSH
1. su - hduser 2. ssh-keygen -t rsa -P "" 3. cat $HOME/.ssh/id_rsa.pub >>$HOME/.ssh/authorized_keys
ssh shashwat //If you able to connect to shashwat successfully without //giving password then ssh is successfully configured, else //delete the .ssh folder in user's home folder and try again to configure ssh
If the SSH connect should fail, these general tips might help:
Enable debugging with ssh -vvv shashwat and investigate the
error in detail. Check the SSH server configuration in /etc/ssh/sshd_config, in particular the optionsPubkeyAuthentication (which should be set to yes) andAllowUsers(if this option is active, add thehduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.
Disabling IPv6
One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of my Ubuntu box. In my case, I realized that theres no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, I simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. To disable IPv6 on Ubuntu 10.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file:
#disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
You have to reboot your machine in order to make the changes take effect. You can check whether IPv6 is enabled on your machine with the following command:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
A return value of 0 means IPv6 is enabled, a value of 1 means disabled (thats what we want).
Alternative
You can also disable IPv6 only for Hadoop as documented in HADOOP-3437. You can do so by adding the following line to conf/hadoop-env.sh:
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Hadoop
Installation
You have to download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. Make sure to change the owner of all the files to the hduser user and hadoop group, for example: $ cd /usr/local
$ sudo tar xzf hadoop-0.20.2.tar.gz $ sudo mv hadoop-0.20.2 hadoop $ sudo chown -R hduser:hadoop hadoop
Hadoop-Configuration
hadoop-env.sh The only required environment variable we have to configure for Hadoop in this tutorial is JAVA_HOME. Open/conf/hadoop-env.sh in the editor of your choice (if you used the installation path in this tutorial, the full path is/usr/local/hadoop/conf/hadoop-env.sh) and set the JAVA_HOME environment variable to the Sun JDK/JRE 6 directory. Change
# The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
to
# The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun
Add the following snippets between the <configuration> </configuration> tags in the respective configuration XML file. In file conf/core-site.xml: <!-- In: conf/core-site.xml --> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property>
<property> <name>fs.default.name</name> <value>hdfs://shashwat:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> In file conf/mapred-site.xml: <!-- In: conf/mapred-site.xml --> <property> <name>mapred.job.tracker</name> <value>shashwat:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> In file conf/hdfs-site.xml: <!-- In: conf/hdfs-site.xml --> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified
when the file is created. The default is used if replication is not specified in create time. </description> </property>
org.apache.hadoop.mapred.JobTracker org.apache.hadoop.hdfs.server.namenode.NameNode org.apache.hadoop.mapred.TaskTracker org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode org.apache.hadoop.hdfs.server.datanode.DataNode If you do not see these 5 processes, check the logs in ~work/hadoop/logs/*.{out,log} for messages that might give you a hint as to what went wrong. Run some example map/reduce jobs The Hadoop distro comes with some example / test map / reduce jobs. Here well run them and make sure things are working end to end. cd ~/work/hadoop # Copy the input files into the distributed filesystem # (there will be no output visible from the command): bin/hadoop fs -put conf input # Run some of the examples provided: # (there will be a large amount of INFO statements as output) bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' # Examine the output files: bin/hadoop fs -cat output/part-00000 The resulting output should be something like: 3 2 1 1 1 1 1 dfs.class dfs.period dfs.file dfs.replication dfs.servers dfsadmin dfsmetrics.log
Configure Hbase :
The following config files all reside in ~/work/hbase/conf. As mentioned earlier, use a FQDN or a Bonjour name instead of shashwat if you need remote clients to access HBase. But if you dont use shashwat here, make sure you do the same in the Hadoop config.
hbase-env.sh
Add the following line below the commented out JAVA_HOME line is in hbase-env.sh
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versi ons/CurrentJDK/Home Add the following line below the commented out HBASE_CLASSPATH= line export HBASE_CLASSPATH=${HOME}/work/hadoop/conf
hbase-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://shashwat:9000/hbase</value> <description>The directory shared by region servers. </description> </property> </configuration>
hbase> # To see the schema for you just created "mylittletable" table and its single "mylittlecolumnfamily", type hbase> describe "mylittletable" hbase> # To add a row whose id is "myrow", to the column "mylittlecolumnfamily:x" with a value of 'v', do hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v" hbase> # To get the cell just added, do hbase> get "mylittletable", "myrow" hbase> # To scan you new table, do hbase> scan "mylittletable" You can stop hbase with the command: ~/work/hbase/bin/stop-hbase.sh Once that has stopped you can stop hadoop: ~/work/hadoop/bin/stop-all.sh
Setting Hbase Client (Accessing Hbase Remotely): Add following to hbase-site.xml <property> <name>hbase.rootdir</name> <value>hdfs://<domain address>:9000/hbase</value> </property> <property> <name>hbase.master</name> <value>shashwat:60000</value> <description>The host and port that the HBase master runs at.</description> </property>
<property> <name>hbase.regionserver.port</name> <value>60020</value> <description>The host and port that the HBase master runs at.</description> </property> <!--<property> <name>hbase.master.port</name> <value>60000</value> <description>The host and port that the HBase master runs at.</description> </property>--> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.tmp.dir</name> <value>/home/shashwat/Hadoop/hbase0.90.4/temp</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>shashwat</value> </property> <property> <name>dfs.replication</name> <value>1</value>
</property> <property> <name>hbase.zookeeper.property.clientPort</name > <value>2181</value> <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect. </description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/<user>/zookeeper</value> <description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. </description> </property>
After adding the above text to hbase-site.xml, start hbase and check if the HMaster is running using -jps- command on the shell. After this the ip address and dommain name of the hbase master should be added to the client machines which are interested in connected to hbase remotely. Suppose hbase.master is running on 192.168.2.125 and the domain name is shashwat Windows -: My computer-> c:-> windows->system32->drivers-
Open the file with admin permission and add the line 192.168.2.125 shashwat //the server where the //hbase.master is running
Building the hbae client : code for accessing the hmaster using client. Following is the client code.
import java.io.IOException; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.MasterNotRunningException; import org.apache.hadoop.hbase.ZooKeeperConnectionException; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HbaseClient { /** * @param args the command line arguments */ public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException { // TODO code application logic here
System.out.println("Hbase Demo Application "); // CONFIGURATION // ENSURE RUNNING Configuration conf = HBaseConfiguration.create(); conf.clear(); conf.set("hbase.zookeeper.quorum", "shashwat"); conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("hbase.master", "shashwat:60000"); HBaseAdmin.checkHBaseAvailable(conf);
System.out.println("HBase is running!"); HTable table = new HTable(conf, "date"); System.out.println("Table obtained"); System.out.println("Fetching data now....."); Get g = new Get(Bytes.toBytes("-101")); Result r = table.get(g); byte[] value = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("cf1")); // If we convert the value bytes, we should get back 'Some Value', the // value we inserted at this location. String valueStr = Bytes.toString(value); System.out.println("GET: " + valueStr); } } Some references :
Note : If you find any mistake in this document, feel free to drop me a mail @
Gmail : dwivedishashwat@gmail.com Facebook : https://www.facebook.com/shriparv Twitter : https://twitter.com/#!/shashwat_2010 Skype : shriparv Visit My blogs at :
http://helpmetocode.blogspot.in/ http://writingishabit.blogspot.in/ http://realiq.blogspot.in/