Created 06-16-2018 07:10 AM
Hi All,
I have created HDP cluster on AWS. Now I want to execute a spark-submit command using shell action.
Spark-submit command is simple, that take input from HDFS and store output in HDFS and .jar file taken from Hadoop local.
My spark-submit command is running well on a command line. It can read data and store output on HDFS in a specific directory.
And I could also create a script and run on command line, it also worked well. But the problem is while executing oozie workflow for this.
script.sh
#!/bin/bash /usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main> --master local[2] <jar_file_path> <HDFS_input_path> <HDFS_output_path>
job.properties
nameNode=hdfs://<HOST>:8020 jobTracker=<HOST>:8050 queueName=default oozie.wf.application.path=${nameNode}/user/oozie/shelloozie
workflow.xml
<workflow-app name="ShellAction" xmlns="uri:oozie:workflow:0.3"> <start to='shell-node' /> <action name="shell-node"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>script.sh</exec> <file>${nameNode}/user/oozie/shelloozie/script.sh#script.sh</file> </shell> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name='end' /> </workflow-app>
Anyways, I have checked my yarn log, it is giving me following, I didn't get it what it is explaining.
LogType:stderr Log Upload Time:Sat Jun 16 07:00:47 +0000 2018 LogLength:1721 Log Contents: Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class Jun 16, 2018 7:00:24 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" Jun 16, 2018 7:00:25 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" Jun 16, 2018 7:00:26 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest" log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. End of LogType:stderr
Kindly help me to solve this.
Thank You,
Jay.
Created 06-16-2018 01:18 PM
@JAy PaTel Try using full path to your spark-submit command in shell script:
/usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main>--master local[2]<jar_file_path><HDFS_input_path><HDFS_output_path>
HTH
Created 06-18-2018 10:21 AM
Hi, I have actually tried both, i.e with the full path of spark-submit and navigate directory and execute there there. But faced same error.
Anyways, I have updated my question. Have look into that.
Regards,
Jay.