Professional Documents
Culture Documents
0 instalation
1. Download version 4.0 of Oozie -> wget http://mirror.symnds.com/software/Apache/oozie/4.0.0/oozie-4.0.0.tar.gz 2. Extract the tar gz file -> tar -xzvf oozie-4.0.0.tar.gz 3. cd oozie-4.0.0 4. Install maven -> sudo apt-get install maven 5. replace in pom xml hadoop version -> find . -name pom.xml | xargs sed -ri 's/(2.2.0\SNAPSHOT)/2.2.0-cdh5.0.0-beta-2/' -> add in pom.file the following repository in order to be able to build oozie with hadoop 2.0.0-mr1-cdh4.4.0
6. ./bin/mkdistro.sh -DskipTests
*10. cp -r /usr/local/hadoop/share/hadoop/common/*jar libext/ *11. cd to libext folder and download the following file -> wget http://extjs.com/deploy/ext2.2.zip
cd $OOZIE_HOME/oozie-server/lib - wget http://extjs.com/deploy/ext-2.2.zip - rm -rf ecj-3.7.2.jar - wget http://repo1.maven.org/maven2/tomcat/jasper-compiler/5.5.23/jasper-compiler5.5.23.jar - wget http://repo1.maven.org/maven2/tomcat/jasper-compiler-jdt/5.5.23/jasper-compiler-jdt5.5.23.jar - cp $HADOOP_HOME/lib//hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.4.0.jar . - cp $HADOOP_HOME/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar .
If set to true, it creates the DB schema if it does not exist. If the DB schema exists is a NOP. If set to false, it does not create the DB schema. If the DB schema does not exist it fails start up. </description>
Validate DB Connection DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE Create OOZIE_SYS table DONE
15. To enable webconsole, we need to install the ext JS library. Also, oozie war file requires few other jar files like hadoop-core-<version>.jar & commons-configuration<version>.jar
-> cd $OOZIE_HOME
If you encounter this issue on running oozie jobs -> Error: E0501 : E0501: Could not perform authorization operation, User: ubuntu is not allowed to impersonate ubuntu -> to be able to run oozie jobs I started oozie and workflow with hdfs user
1. http://www.ibm.com/developerworks/library/bd-ooziehadoop/
Apache Oozie is an open source project based on Java technology that simplifies the process of creating workflows and managing coordination among jobs. In principle, Oozie offers the ability to combine multiple jobs sequentially into one logical unit of work. One advantage of the Oozie framework is that it is fully integrated with the Apache Hadoop stack and supports Hadoop jobs for Apache MapReduce, Pig, Hive, and Sqoop. In addition, it can be used to schedule jobs specific to a system, such as Java programs. Therefore, using Oozie, Hadoop administrators are able to build complex data transformations that can combine the processing of different individual tasks and even sub-workflows. This ability allows for greater control over complex jobs and makes it easier to repeat those jobs at predetermined periods. In practice, there are different types of Oozie jobs:
Oozie Workflow jobs Represented as directed acyclical graphs to specify a sequence of actions to be executed. Oozie Coordinator jobs Represent Oozie workflow jobs triggered by time and data availability. Oozie Bundle Facilitates packaging multiple coordinator and workflow jobs, and makes it easier to manage the life cycle of those jobs.
Feb 19, 2014 7:23:47 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: Feb 19, 2014 7:23:47 AM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-11000 Feb 19, 2014 7:23:47 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 590 ms Feb 19, 2014 7:23:47 AM org.apache.catalina.core.StandardService start INFO: Starting service Catalina Feb 19, 2014 7:23:47 AM org.apache.catalina.core.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/6.0.36 Feb 19, 2014 7:23:47 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor oozie.xml
Stacktrace: ----------------------------------------------------------------java.lang.NoClassDefFoundError: org/apache/hadoop/util/ReflectionUtils at org.apache.oozie.service.Services.setServiceInternal(Services.java:359) at org.apache.oozie.service.Services.<init>(Services.java:108) at org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoader.java:38) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4206) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4705) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ReflectionUtils at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1680) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1526) ... 27 more
<bean id="jacksonObjectMapper" class="org.codehaus.jackson.map.ObjectMapper" /> <bean id="mappingJacksonHttpMessageConverter" class="org.springframework.http.converter.json.MappingJacksonHttpMessageConverter"> <property name="objectMapper" ref="jacksonObjectMapper" /> </bean> <bean id="annotationMethodHandlerExceptionResolver" class="org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerExceptio nResolver"> <property name="messageConverters"> <array> <ref bean="mappingJacksonHttpMessageConverter" /> </array> </property> </bean>
conf.setProperty("rawlogsLoc", "hdfs://namenode.bigdata.com:8020/user/workspace/"); conf.setProperty("mergedlogsLoc", "jobtracker.bigdata.com:8021"); try { String jobId = wc.run(conf); System.out.println("Workflow job submitted");
while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) { System.out.println("Workflow job running ..."); Thread.sleep(10 * 1000); } System.out.println("Workflow job completed ..."); System.out.println(wc.getJobInfo(jobId)); } catch (Exception r) { System.out.println("Errors"); } } }
/user/hue/oozie/workspaces/_hdfs_-oozie-26-1393496375.26
<plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>tomcat-maven-plugin</artifactId> <configuration> <server>tomcat</server> <url>http://localhost:8080/manager/html</url> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-war-plugin</artifactId> <version>2.1.1</version> <configuration> <packagingExcludes>WEB-INF/lib/servlet-api*.jar,WEBINF/lib/jsp-api*.jar</packagingExcludes> </configuration> </plugin> <plugin>