Reply
Contributor
Posts: 49
Registered: ‎01-05-2016
Accepted Solution

"Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in Oozie Workflow

Hello, I'm using CDH 5.5.1 with Spark 1.5.0.

 

I'm unsuccessfully trying to execute a simple Spark action (Python script) via Oozie. As for now I just want to be able to run something at all, the script is still a silly example, it doesn't really do anything. It is as follows:

 

## IMOPORT FUNCTIONS
from pyspark.sql.functions import *

## CREATE MAIN DATAFRAME
eventi_DF = sqlContext.table("eventi")

 

I created a simple Oozie Workflow from Hue GUI. I used the following settings for the Spark action:

 

SPARK MASTER: yarn-cluster
MODE: cluster
APP NAME: MySpark
JARS / PY FILES: lib/test.py
MAIN CLASS: org.apache.spark.examples.mllib.JavaALS
ARGUMENTS: <No Arguments Defined>

I've uploaded the Script in HDFS under the Workspace "/user/hue/oozie/workspaces/hue-oozie-1460736691.98/lib" directory, and I'm sure it gets picked up (as just to understand it was meant to be put in this directory I had to work a little bit, fighting a "test.py" not found" exception, that now is not there anymore).

 

As of now, when I try to run the Workflow by pressing the "Play" button on GUI, this is what I get in the Action Logfile:

 

>>> Invoking Spark class now >>>


<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not found: SPARK_HOME
java.util.NoSuchElementException: key not found: SPARK_HOME
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:58)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:58)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:943)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:942)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:942)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:630)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:124)
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:914)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:973)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:185)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:176)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:46)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:378)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:296)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://mnethdp01.glistencube.com:8020/user/admin/oozie-oozi/0000000-160416120358569-oozie-oozi-W/spark-3ba6--spark/action-data.seq

Oozie Launcher ends

Now, I guess the problem is:

 

Failing Oozie Launcher, ... key not found: SPARK_HOME

I've been trying hard to set this SPARK_HOME Key in different places. Things I've tried include the following:

 

  • Modified Spark Config in Cloudera Manager and reloaded the Configuration:
Spark Service Environment Advanced Configuration Snippet (Safety Valve):
SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
  • Modified Oozie Config in Cloudera Manager and reloaded the Configuration:

 

Oozie Service Environment Advanced Configuration Snippet (Safety Valve)
SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
  • Used different ways of invoking the Spark Action inside the Spark Action Definition:

 

SPARK MASTER: local[*]
SPARK MODE: client

SPARK MASTER: yarn-cluster
SPARK MODE: cluster

SPARK MASTER: yarn-client
SPARK MODE: client
  • Modified the "/etc/alternatives/spark-conf/spark-defaults.conf" File manually, adding the following inside it (I did it just on one node, by the way):

 

spark.yarn.appMasterEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
spark.executorEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark

All the above to no success. Apparently I'm not able to set the required key anywhere.

 

What am I doing wrong? Isn't this meant to be pretty straightforward? Thanks in advance for any insights.

 

 

 

 

 

 

Contributor
Posts: 31
Registered: ‎06-26-2015

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

For me, I had to add following Oozie workflow configuration:
oozie.launcher.yarn.app.mapreduce.am.env: SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

yes i know, they could have done better job than this.
Contributor
Posts: 49
Registered: ‎01-05-2016

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

Hi Ben, thanks a whole lot for your reply.

 

May I ask you where exactly you specified that setting?

 

- In the GUI, in some particular field?

- In "workflow.xml", in the Job's directory in HDFS? If yes: as an "arg", as a "property", or..?

- In "job.properties", in the Job's directory in HDFS? If yes: how?

- In some other file? E.g. "/etc/alternatives/spark-conf/spark-defaults.conf"? If yes, how?

 

A snippet of your code would be extremely appreciated!

 

I'm asking you because I've tried all of the above with your suggestion but I did not succeed.

 

Thanks again for your help

Contributor
Posts: 31
Registered: ‎06-26-2015

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

Hi

either in your Hue Oozie workflow editor UI (workflow settings -> Hadoop Properties)

or on your workflow.xml

 

<workflow-app name="Workflow name" xmlns="uri:oozie:workflow:0.5">
  <global>
            <configuration>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                    <value>SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/</value>
                </property>
            </configuration>
  </global>

.....

 

Highlighted
Contributor
Posts: 49
Registered: ‎01-05-2016

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

I got past this! Still no cigar, though. Now I have another error, but I'm going to work on this. It's something different now...

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, File file:/hdp01/yarn/nm/usercache/admin/appcache/application_1463068686660_0013/container_1463068686660_0013_01_000001/lib/test.py does not exist
java.io.FileNotFoundException: File file:/hdp01/yarn/nm/usercache/admin/appcache/application_1463068686660_0013/container_1463068686660_0013_01_000001/lib/test.py does not exist

Many thanks for your help. I'd never be able to figure this out by myself!

aj
Explorer
Posts: 8
Registered: ‎06-07-2016

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

I'm getting this error also.  Have you managed to solve it?

Contributor
Posts: 49
Registered: ‎01-05-2016

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

Hi aj,

 

yes I did manage to solve it. Please, take a look at the following thread and see if it can be of help. It may seem a bit unrelated from the "test.py not found" issue, but it contains detailed info about how to specify all the needed parameters to let the whole thing run smoothly:

 

 

http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-workflow-Spark-action-using-sim...

 

HTH

aj
Explorer
Posts: 8
Registered: ‎06-07-2016

Re: "Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in

Ah, my error was not using HDFS: for the .py.  Thanks!

Announcements