You are on page 1of 7

Oozie 4.

0 instalation

1. Download version 4.0 of Oozie -> wget http://mirror.symnds.com/software/Apache/oozie/4.0.0/oozie-4.0.0.tar.gz 2. Extract the tar gz file -> tar -xzvf oozie-4.0.0.tar.gz 3. cd oozie-4.0.0 4. Install maven -> sudo apt-get install maven 5. replace in pom xml hadoop version -> find . -name pom.xml | xargs sed -ri 's/(2.2.0\SNAPSHOT)/2.2.0-cdh5.0.0-beta-2/' -> add in pom.file the following repository in order to be able to build oozie with hadoop 2.0.0-mr1-cdh4.4.0

<repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository>

6. ./bin/mkdistro.sh -DskipTests

7. - cd distro/target/oozie-4.0.0-distro/ - cp -r oozie-4.0.0/ ~/oozie

8. - export OOZIE_HOME=/home/ubuntu/oozie and cd $OOZIE_HOME - export HADOOP_HOME="/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39"

*10. cp -r /usr/local/hadoop/share/hadoop/common/*jar libext/ *11. cd to libext folder and download the following file -> wget http://extjs.com/deploy/ext2.2.zip

cd $OOZIE_HOME/oozie-server/lib - wget http://extjs.com/deploy/ext-2.2.zip - rm -rf ecj-3.7.2.jar - wget http://repo1.maven.org/maven2/tomcat/jasper-compiler/5.5.23/jasper-compiler5.5.23.jar - wget http://repo1.maven.org/maven2/tomcat/jasper-compiler-jdt/5.5.23/jasper-compiler-jdt5.5.23.jar - cp $HADOOP_HOME/lib//hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.4.0.jar . - cp $HADOOP_HOME/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar .

- cp $HADOOP_HOME/lib/hadoop/lib/*.jar . - cp $HADOOP_HOME/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar . - cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.4.0.jar .

*12. cp $OOZIE_HOME/distro/target/oozie-4.0.0-distro/oozie4.0.0/oozie.war $OOZIE_HOME/webapp/src/main/webapp/

13. nano $OOZIE_HOME/conf/oozie-site.xml

<property> <name>oozie.service.JPAService.create.db.schema</name> <value>true</value> <description> Creates Oozie DB.

If set to true, it creates the DB schema if it does not exist. If the DB schema exists is a NOP. If set to false, it does not create the DB schema. If the DB schema does not exist it fails start up. </description>

14. Run the following command to create OOZIE DB:

$OOZIE_HOME/bin/ooziedb.sh create -sqlfile oozie.sql -run setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

Validate DB Connection DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE Create OOZIE_SYS table DONE

Oozie DB has been created for Oozie version '4.0.0'

The SQL commands have been written to: oozie.sql

15. To enable webconsole, we need to install the ext JS library. Also, oozie war file requires few other jar files like hadoop-core-<version>.jar & commons-configuration<version>.jar

-> cd $OOZIE_HOME

./bin/addtowar.sh -inputwar oozie.war -outputwar oozie1.war -jars ~/oozie4.0.0/hadooplibs/target/oozie-4.0.0-hadooplibs/oozie-4.0.0/hadooplibs/hadooplib1.1.1.oozie-4.0.0/*.jar -extjs $OOZIE_HOME/oozie-server/lib/ext-2.2.zip

-> cd ~/oozie-4.0.0/sharelib/target/oozie-sharelib-4.0.0 - sudo -u hdfs hadoop fs -put share share

16. - rm -rf oozie.war - mv oozie1.war oozie.war - cp oozie.war $OOZIE_HOME/oozie-server/webapps/

17. start oozie -> oozied.sh start

If you encounter this issue on running oozie jobs -> Error: E0501 : E0501: Could not perform authorization operation, User: ubuntu is not allowed to impersonate ubuntu -> to be able to run oozie jobs I started oozie and workflow with hdfs user

1. http://www.ibm.com/developerworks/library/bd-ooziehadoop/

Apache Oozie is an open source project based on Java technology that simplifies the process of creating workflows and managing coordination among jobs. In principle, Oozie offers the ability to combine multiple jobs sequentially into one logical unit of work. One advantage of the Oozie framework is that it is fully integrated with the Apache Hadoop stack and supports Hadoop jobs for Apache MapReduce, Pig, Hive, and Sqoop. In addition, it can be used to schedule jobs specific to a system, such as Java programs. Therefore, using Oozie, Hadoop administrators are able to build complex data transformations that can combine the processing of different individual tasks and even sub-workflows. This ability allows for greater control over complex jobs and makes it easier to repeat those jobs at predetermined periods. In practice, there are different types of Oozie jobs:

Oozie Workflow jobs Represented as directed acyclical graphs to specify a sequence of actions to be executed. Oozie Coordinator jobs Represent Oozie workflow jobs triggered by time and data availability. Oozie Bundle Facilitates packaging multiple coordinator and workflow jobs, and makes it easier to manage the life cycle of those jobs.

2. http://archive.cloudera.com/cdh4/cdh/4/oozie/WebServicesAPI.html 3. http://archive.cloudera.com/cdh/3/oozie/DG_Examples.html#Java_API_Example 4. http://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html 5. http://blog.cloudera.com/blog/2013/06/how-to-use-the-apache-oozie-rest-api/

Feb 19, 2014 7:23:47 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: Feb 19, 2014 7:23:47 AM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-11000 Feb 19, 2014 7:23:47 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 590 ms Feb 19, 2014 7:23:47 AM org.apache.catalina.core.StandardService start INFO: Starting service Catalina Feb 19, 2014 7:23:47 AM org.apache.catalina.core.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/6.0.36 Feb 19, 2014 7:23:47 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor oozie.xml

ERROR: Oozie could not be started

REASON: java.lang.NoClassDefFoundError: org/apache/hadoop/util/ReflectionUtils

Stacktrace: ----------------------------------------------------------------java.lang.NoClassDefFoundError: org/apache/hadoop/util/ReflectionUtils at org.apache.oozie.service.Services.setServiceInternal(Services.java:359) at org.apache.oozie.service.Services.<init>(Services.java:108) at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:38) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4206) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4705) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ReflectionUtils at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1680) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1526) ... 27 more

@RequestParam(value = "offset", required = false) Integer offset,

@RequestParam(value = "limit", required = false) Integer limit

<bean id="jacksonObjectMapper" class="org.codehaus.jackson.map.ObjectMapper" /> <bean id="mappingJacksonHttpMessageConverter" class="org.springframework.http.converter.json.MappingJacksonHttpMessageConverter"> <property name="objectMapper" ref="jacksonObjectMapper" /> </bean> <bean id="annotationMethodHandlerExceptionResolver" class="org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerExceptio nResolver"> <property name="messageConverters"> <array> <ref bean="mappingJacksonHttpMessageConverter" /> </array> </property> </bean>

@RequestParam(value = "filter", required = false)

Run Oozie job from java code


public static void main(String[] args) { OozieClient wc = new OozieClient("http://host:11000/oozie"); Properties conf = wc.createConfiguration(); conf.setProperty(OozieClient.APP_PATH, "hdfs://cluster/user/apps/merge-psplogs/merge-wf/workflow.xml"); conf.setProperty("jobTracker", "jobtracker.bigdata.com:8021"); conf.setProperty("nameNode", "hdfs://namenode.bigdata.com:8020"); conf.setProperty("queueName", "jobtracker.bigdata.com:8021"); conf.setProperty("appsRoot", "hdfs://namenode.bigdata.com:8020/user/workspace/apps"); conf.setProperty("appLibLoc", "hdfs://namenode.bigdata.com:8020/user/workspace/lib");

conf.setProperty("rawlogsLoc", "hdfs://namenode.bigdata.com:8020/user/workspace/"); conf.setProperty("mergedlogsLoc", "jobtracker.bigdata.com:8021"); try { String jobId = wc.run(conf); System.out.println("Workflow job submitted");

while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) { System.out.println("Workflow job running ..."); Thread.sleep(10 * 1000); } System.out.println("Workflow job completed ..."); System.out.println(wc.getJobInfo(jobId)); } catch (Exception r) { System.out.println("Errors"); } } }

/user/hue/oozie/workspaces/_hdfs_-oozie-26-1393496375.26

<plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>tomcat-maven-plugin</artifactId> <configuration> <server>tomcat</server> <url>http://localhost:8080/manager/html</url> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-war-plugin</artifactId> <version>2.1.1</version> <configuration> <packagingExcludes>WEB-INF/lib/servlet-api*.jar,WEBINF/lib/jsp-api*.jar</packagingExcludes> </configuration> </plugin> <plugin>

You might also like