Support Questions
Find answers, ask questions, and share your expertise

spark in oozie is not working

Solved Go to solution
Highlighted

spark in oozie is not working

Explorer

Hi All,

I am using hdp sandbox 2.3.4.

I have created one oozie job and I am submitting spark job on yarn-cluster (--master yarn-cluster).

workflow.xml looks as below

<workflow-app name="sample" xmlns="uri:oozie:workflow:0.1">
   <start to="spark-action" />
    <action name="spark-action">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>${master}</master>
            <name>${csvProcessingJobName}</name>
            <class>${csvProcessingJobClass}</class>
            <jar>${jarName}</jar>
            <arg>${csvProcessingArg1}</arg>
            <arg>${csvProcessingArg2}</arg>
            <arg>${csvProcessingArg3}</arg>
            <arg>${csvProcessingArg4}</arg>
        </spark>
        <ok to="end" />
        <error to="end" />
    </action>
  
   <end name = "end" />
</workflow-app>

job.properties

############ GENERAL HDFS AND ORACLE DB CONNECTION PROPERTIES ############

jobTracker=sandbox.hortonworks.com:8050
nameNode=hdfs://sandbox.hortonworks.com:8020

############ BUNDLE PROPERTIES ############

bundleAppName=bundle
bundleKickOffTime=2016-05-04T07:00Z
oozie.bundle.application.path=${nameNode}/user/root/oozie/config/spark/bundle.xml
oozie.use.system.libpath=true

############ COORDINATOR PROPERTIES ############

coordinatorAppPath=${nameNode}/user/root/oozie/config/spark/coordinator.xml
coordinatorAppName=csv-processing-coordinator
coordinatorStartTime=2016-05-04T01:00Z
coordinatorEndTime=2016-05-05T01:00Z
coordinatorFrequency=1440
coordinatorTimeZone=UTC

############ WORKFLOW PROPERTIES ############
workflowAppPath=${nameNode}/user/root/oozie/config/spark/workflow.xml
workflowAppName=rdbms-to-hadoop-workflow

master=yarn-cluster
jarName=${nameNode}/user/root/oozie/config/spark/processor.jar


csvProcessingJobName=processing-job
csvProcessingJobClass=com.job.SparkJob
csvProcessingArg1=${nameNode}/user/root/abc.csv
csvProcessingArg2=tableName
csvProcessingArg3=/apps/hive/warehouse/tableName
csvProcessingArg4=parquet

But I am getting below error in my mapreduce job. (checking log in job history UI) .. I am not sure what's the cause

 Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Call From sandbox.hortonworks.com/192.168.0.105 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
java.net.ConnectException: Call From sandbox.hortonworks.com/192.168.0.105 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
	at org.apache.hadoop.ipc.Client.call(Client.java:1431)
	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
	at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
	at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
	at org.apache.spark.Logging$class.logInfo(Logging.scala:58)
	at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:62)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:128)
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1065)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1125)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:612)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:710)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
	at org.apache.hadoop.ipc.Client.call(Client.java:1397)
	... 45 more

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://sandbox.hortonworks.com:8020/user/root/oozie-oozi/0000002-160815115226550-oozie-oozi-W/csv-processing-spark-action--spark/action-data.seq

Oozie Launcher ends

Any idea ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: spark in oozie is not working

@Ankit A

Are you able to run the Spark job from the shell/command line? If so, then you may want to use Shell Action instead. Oozie Spark Action in HDP 2.3.4 is still in tech preview and not supported yet. The below tech note was released with the recommendation to use Shell Actions or Java Actions instead.

https://community.hortonworks.com/content/kbentry/51582/how-to-use-oozie-shell-action-to-run-a-spark...

--------------------

Begin Tech Note

--------------------

Because spark action in oozie is not supported in HDP 2.3.x and HDP 2.4.0, there is no workaround especially in kerberos environment. We can use either java action or shell action to launch spark job in oozie workflow. In this article, we will discuss how to use oozie shell action to run a spark job in kerberos environment.

Prerequisite:

1. Spark client is installed on every host where nodemanager is running. This is because we have no control over which node the

2. Optionally, if the spark job need to interact with hbase cluster, hbase client need to be installed on every host as well.

Steps:

1. Create a shell script with the spark-submit command. For example, in the script.sh:

/usr/hdp/current/spark-client/bin/spark-submit --keytab keytab --principal ambari-qa-falconJ@FALCONJSECURE.COM --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 500m --num-executors 1 --executor-memory 500m --executor-cores 1 spark-examples.jar 3

2. Prepare kerberos keytab which will be used by the spark job. For example, we use ambari smoke test user, the keytab is already generated by Ambari in/etc/security/keytabs/smokeuser.headless.keytab.

3. Create the oozie workflow with a shell action which will execute the script created above, for example, in the workflow.xml:

<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.4">
  <start to="shellAction"/>
    <action name="shellAction">
      <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>script.sh</exec>
        <file>/user/oozie/shell/script.sh#script.sh</file>
        <file>/user/oozie/shell/smokeuser.headless.keytab#keytab</file>
        <file>/user/oozie/shell/spark-examples.jar#spark-examples.jar</file>
        <capture-output/>
      </shell>
      <ok to="end"/>
      <error to="killAction"/>
  </action>
  <kill name="killAction">
    <message>"Killed job due to error"</message>
  </kill>
  <end name="end"/>
</workflow-app>

4. Create the oozie job properties file. For example, in job.properties:

nameNode=falconJ2.sec.support.com:8050

queueName=default

oozie.wf.application.path=${nameNode}/user/oozie/shell

oozie.use.system.libpath=true

5. Upload the following files created above to the oozie workflow application path in HDFS (In this example: /user/oozie/shell):

- workflow.xml

- smokeuser.headless.keytab

- script.sh

- spark uber jar (In this example: /usr/hdp/current/spark-client/lib/spark-examples*.jar)

- Any other configuration file mentioned in workflow (optional)

6. Execute the oozie command to run this workflow. For example:

oozie job -oozie http://<oozie-server>:11000/oozie -config job.properties -run

--------------------

End Tech Note

--------------------

See similar/related response here:

https://community.hortonworks.com/questions/22772/oozie-spark-action-giving-key-not-found-spark-home...

View solution in original post

2 REPLIES 2
Highlighted

Re: spark in oozie is not working

There is a similar issue here https://community.hortonworks.com/questions/23132/i-am-getting-error-in-oozie-workflow-what-i-have-d.... Please try the recommended suggestion in that post and update the post if it worked or not.

Re: spark in oozie is not working

@Ankit A

Are you able to run the Spark job from the shell/command line? If so, then you may want to use Shell Action instead. Oozie Spark Action in HDP 2.3.4 is still in tech preview and not supported yet. The below tech note was released with the recommendation to use Shell Actions or Java Actions instead.

https://community.hortonworks.com/content/kbentry/51582/how-to-use-oozie-shell-action-to-run-a-spark...

--------------------

Begin Tech Note

--------------------

Because spark action in oozie is not supported in HDP 2.3.x and HDP 2.4.0, there is no workaround especially in kerberos environment. We can use either java action or shell action to launch spark job in oozie workflow. In this article, we will discuss how to use oozie shell action to run a spark job in kerberos environment.

Prerequisite:

1. Spark client is installed on every host where nodemanager is running. This is because we have no control over which node the

2. Optionally, if the spark job need to interact with hbase cluster, hbase client need to be installed on every host as well.

Steps:

1. Create a shell script with the spark-submit command. For example, in the script.sh:

/usr/hdp/current/spark-client/bin/spark-submit --keytab keytab --principal ambari-qa-falconJ@FALCONJSECURE.COM --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 500m --num-executors 1 --executor-memory 500m --executor-cores 1 spark-examples.jar 3

2. Prepare kerberos keytab which will be used by the spark job. For example, we use ambari smoke test user, the keytab is already generated by Ambari in/etc/security/keytabs/smokeuser.headless.keytab.

3. Create the oozie workflow with a shell action which will execute the script created above, for example, in the workflow.xml:

<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.4">
  <start to="shellAction"/>
    <action name="shellAction">
      <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>script.sh</exec>
        <file>/user/oozie/shell/script.sh#script.sh</file>
        <file>/user/oozie/shell/smokeuser.headless.keytab#keytab</file>
        <file>/user/oozie/shell/spark-examples.jar#spark-examples.jar</file>
        <capture-output/>
      </shell>
      <ok to="end"/>
      <error to="killAction"/>
  </action>
  <kill name="killAction">
    <message>"Killed job due to error"</message>
  </kill>
  <end name="end"/>
</workflow-app>

4. Create the oozie job properties file. For example, in job.properties:

nameNode=falconJ2.sec.support.com:8050

queueName=default

oozie.wf.application.path=${nameNode}/user/oozie/shell

oozie.use.system.libpath=true

5. Upload the following files created above to the oozie workflow application path in HDFS (In this example: /user/oozie/shell):

- workflow.xml

- smokeuser.headless.keytab

- script.sh

- spark uber jar (In this example: /usr/hdp/current/spark-client/lib/spark-examples*.jar)

- Any other configuration file mentioned in workflow (optional)

6. Execute the oozie command to run this workflow. For example:

oozie job -oozie http://<oozie-server>:11000/oozie -config job.properties -run

--------------------

End Tech Note

--------------------

See similar/related response here:

https://community.hortonworks.com/questions/22772/oozie-spark-action-giving-key-not-found-spark-home...

View solution in original post