Created 06-16-2018 07:10 AM
Hi All,
I have created HDP cluster on AWS. Now I want to execute a spark-submit command using shell action.
Spark-submit command is simple, that take input from HDFS and store output in HDFS and .jar file taken from Hadoop local.
My spark-submit command is running well on a command line. It can read data and store output on HDFS in a specific directory.
And I could also create a script and run on command line, it also worked well. But the problem is while executing oozie workflow for this.
script.sh
#!/bin/bash /usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main> --master local[2] <jar_file_path> <HDFS_input_path> <HDFS_output_path>
job.properties
nameNode=hdfs://<HOST>:8020
jobTracker=<HOST>:8050
queueName=default
oozie.wf.application.path=${nameNode}/user/oozie/shelloozieworkflow.xml
<workflow-app name="ShellAction" xmlns="uri:oozie:workflow:0.3">
<start to='shell-node' />
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>script.sh</exec>
<file>${nameNode}/user/oozie/shelloozie/script.sh#script.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>Anyways, I have checked my yarn log, it is giving me following, I didn't get it what it is explaining.
LogType:stderr Log Upload Time:Sat Jun 16 07:00:47 +0000 2018 LogLength:1721 Log Contents: Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class Jun 16, 2018 7:00:24 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" Jun 16, 2018 7:00:25 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" Jun 16, 2018 7:00:26 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest" log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. End of LogType:stderr
Kindly help me to solve this.
Thank You,
Jay.
Created 06-16-2018 01:18 PM
@JAy PaTel Try using full path to your spark-submit command in shell script:
/usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main>--master local[2]<jar_file_path><HDFS_input_path><HDFS_output_path>
HTH
Created 06-18-2018 10:21 AM
Hi, I have actually tried both, i.e with the full path of spark-submit and navigate directory and execute there there. But faced same error.
Anyways, I have updated my question. Have look into that.
Regards,
Jay.