Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to run spark-submit command using shell action in Oozie

Highlighted

How to run spark-submit command using shell action in Oozie

Contributor

Hi All,

I have created HDP cluster on AWS. Now I want to execute a spark-submit command using shell action.

Spark-submit command is simple, that take input from HDFS and store output in HDFS and .jar file taken from Hadoop local.

My spark-submit command is running well on a command line. It can read data and store output on HDFS in a specific directory.

And I could also create a script and run on command line, it also worked well. But the problem is while executing oozie workflow for this.

script.sh

#!/bin/bash
/usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main> --master local[2] <jar_file_path> <HDFS_input_path> <HDFS_output_path>

job.properties

nameNode=hdfs://<HOST>:8020
jobTracker=<HOST>:8050
queueName=default
oozie.wf.application.path=${nameNode}/user/oozie/shelloozie

workflow.xml

<workflow-app name="ShellAction" xmlns="uri:oozie:workflow:0.3">
   <start to='shell-node' />
   <action name="shell-node">
     <shell xmlns="uri:oozie:shell-action:0.1">
       <job-tracker>${jobTracker}</job-tracker>
       <name-node>${nameNode}</name-node>
       <configuration>
         <property>
           <name>mapred.job.queue.name</name>
           <value>${queueName}</value>
         </property>
       </configuration>
       <exec>script.sh</exec>
       <file>${nameNode}/user/oozie/shelloozie/script.sh#script.sh</file>
     </shell>
    <ok to="end"/>
    <error to="fail"/>
  </action>
  <kill name="fail">
    <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name='end' /> 
</workflow-app>

Anyways, I have checked my yarn log, it is giving me following, I didn't get it what it is explaining.

LogType:stderr
Log Upload Time:Sat Jun 16 07:00:47 +0000 2018
LogLength:1721
Log Contents:
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Jun 16, 2018 7:00:24 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Jun 16, 2018 7:00:25 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Jun 16, 2018 7:00:26 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
End of LogType:stderr

Kindly help me to solve this.

Thank You,

Jay.

2 REPLIES 2

Re: How to run spark-submit command using shell action in Oozie

@JAy PaTel Try using full path to your spark-submit command in shell script:

/usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main>--master local[2]<jar_file_path><HDFS_input_path><HDFS_output_path>

HTH

Re: How to run spark-submit command using shell action in Oozie

Contributor

@Felix Albani

Hi, I have actually tried both, i.e with the full path of spark-submit and navigate directory and execute there there. But faced same error.

Anyways, I have updated my question. Have look into that.

Regards,

Jay.

Don't have an account?
Coming from Hortonworks? Activate your account here