Member since
11-07-2017
3
Posts
0
Kudos Received
0
Solutions
11-15-2017
07:26 PM
Hi,
I'm trying to use HDP sandbox. I use latest available sandbox HDP 2.6.1.
I wrote simple spark application (java).
The application generates some data and saves it in CSV file on HDFS.
When I submit the job from cmd (spark submit), the job works fine.
Now I'm trying to run my job via oozie (shell action).
I found few articles how to do it, e.g. https://community.hortonworks.com/articles/51582/how-to-use-oozie-shell-action-to-run-a-spark-job-i-1.html
But I change few parameters in the script: --master yarn --deploy-mode cluster.
workflow.xml:
<workflow-app xmlns="uri:oozie:workflow:0.5" name="Spark2WordCount">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${shScript}</exec>
<file>${shScriptPath}#${shScript}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name="end" />
</workflow-app>
job.properties:
nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8032
master=yarn
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/maria_dev/apps/word-count-spark2
shScriptPath=${oozie.wf.application.path}/submit-job.sh
shScript=submit-job.sh
script file:
ip_address="sandbox.hortonworks.com"
main_class="com.samples.wordcount_spark2.App"
jar_app_name="hdfs://${ip_address}:8020/user/maria_dev/apps/word-count-spark2/lib/original-wordcount_spark2-0.0.1-SNAPSHOT.jar"
master_node_url="yarn"
deploy_mode="cluster"
hdfs_input_path="hdfs://${ip_address}:8020/user/maria_dev/apps/word-count-spark2/input.txt"
hdfs_output_path="hdfs://${ip_address}:8020/user/maria_dev/apps/word-count-spark2/output.txt"
/usr/hdp/current/spark2-client/bin/spark-submit --verbose --class ${main_class} --master ${master_node_url} --deploy-mode ${deploy_mode} ${jar_app_name} ${hdfs_input_path} ${hdfs_output_path}
To run the oozie job I use the command below:
oozie job -config job.properties -run
The job is running in OozieUI.
$ oozie job -info 0000012-171114162604519-oozie-oozi-W
Job ID : 0000012-171114162604519-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : Spark2WordCount
App Path : hdfs://sandbox.hortonworks.com:8020/user/maria_dev/apps/word-count-spark2
Status : RUNNING
Run : 0
User : maria_dev
Group : -
Created : 2017-11-14 18:55 GMT
Started : 2017-11-14 18:55 GMT
Last Modified : 2017-11-14 19:50 GMT
Ended : -
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000012-171114162604519-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000012-171114162604519-oozie-oozi-W@shell-node RUNNING job_1510681883130_0014 RUNNING -
------------------------------------------------------------------------------------------------------------------------------------
In resource manager I also see 2 new applications (second app starts after some time). $yarn application -list
17/11/14 19:57:40 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8032
17/11/14 19:57:41 INFO client.AHSProxy: Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):2
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1510681883130_0015 com.pingidentity.pingid.samples.wordcount_spark2.App SPARK maria_dev default RUNNING UNDEFINED 10% http://172.17.0.2:33153
application_1510681883130_0014 oozie:launcher:T=shell:W=Spark2WordCount:A=shell-node:ID=0000012-171114162604519-oozie-oozi-W MAPREDUCE maria_dev default RUNNING UNDEFINED 95% http://sandbox.hortonworks.com:34577
The problem it never stops. Look likes the first app takes all resources and the second app waits. If I will kill first app (yarn application -kill appId) the second app will be finished successfully. Where am I wrong or what did I miss? P.S.: When I changed deploy mode to client in logs I found that my main class is not found Warning: Skip remote jar hdfs://sandbox.hortonworks.com:8020/user/maria_dev/apps/word-count-spark2/lib/original-wordcount_spark2-0.0.1-SNAPSHOT.jar.
java.lang.ClassNotFoundException: com.samples.wordcount_spark2.App
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:707)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
... View more
Labels:
11-14-2017
10:29 AM
Did you solve this problem?
... View more
11-07-2017
02:31 PM
@klksrinivas Did find a solution how to run a job (Spark2) with Oozie?
... View more