Support Questions
Find answers, ask questions, and share your expertise

"Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in Oozie Workflow

Rising Star

Hello, I'm using CDH 5.5.1 with Spark 1.5.0.

 

I'm unsuccessfully trying to execute a simple Spark action (Python script) via Oozie. As for now I just want to be able to run something at all, the script is still a silly example, it doesn't really do anything. It is as follows:

 

## IMOPORT FUNCTIONS
from pyspark.sql.functions import *

## CREATE MAIN DATAFRAME
eventi_DF = sqlContext.table("eventi")

 

I created a simple Oozie Workflow from Hue GUI. I used the following settings for the Spark action:

 

SPARK MASTER: yarn-cluster
MODE: cluster
APP NAME: MySpark
JARS / PY FILES: lib/test.py
MAIN CLASS: org.apache.spark.examples.mllib.JavaALS
ARGUMENTS: <No Arguments Defined>

I've uploaded the Script in HDFS under the Workspace "/user/hue/oozie/workspaces/hue-oozie-1460736691.98/lib" directory, and I'm sure it gets picked up (as just to understand it was meant to be put in this directory I had to work a little bit, fighting a "test.py" not found" exception, that now is not there anymore).

 

As of now, when I try to run the Workflow by pressing the "Play" button on GUI, this is what I get in the Action Logfile:

 

>>> Invoking Spark class now >>>


<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not found: SPARK_HOME
java.util.NoSuchElementException: key not found: SPARK_HOME
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:58)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:58)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:943)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:942)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:942)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:630)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:124)
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:914)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:973)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:185)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:176)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:46)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:378)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:296)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://mnethdp01.glistencube.com:8020/user/admin/oozie-oozi/0000000-160416120358569-oozie-oozi-W/spark-3ba6--spark/action-data.seq

Oozie Launcher ends

Now, I guess the problem is:

 

Failing Oozie Launcher, ... key not found: SPARK_HOME

I've been trying hard to set this SPARK_HOME Key in different places. Things I've tried include the following:

 

  • Modified Spark Config in Cloudera Manager and reloaded the Configuration:
Spark Service Environment Advanced Configuration Snippet (Safety Valve):
SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
  • Modified Oozie Config in Cloudera Manager and reloaded the Configuration:

 

Oozie Service Environment Advanced Configuration Snippet (Safety Valve)
SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
  • Used different ways of invoking the Spark Action inside the Spark Action Definition:

 

SPARK MASTER: local[*]
SPARK MODE: client

SPARK MASTER: yarn-cluster
SPARK MODE: cluster

SPARK MASTER: yarn-client
SPARK MODE: client
  • Modified the "/etc/alternatives/spark-conf/spark-defaults.conf" File manually, adding the following inside it (I did it just on one node, by the way):

 

spark.yarn.appMasterEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
spark.executorEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark

All the above to no success. Apparently I'm not able to set the required key anywhere.

 

What am I doing wrong? Isn't this meant to be pretty straightforward? Thanks in advance for any insights.

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Contributor

Hi

either in your Hue Oozie workflow editor UI (workflow settings -> Hadoop Properties)

or on your workflow.xml

 

<workflow-app name="Workflow name" xmlns="uri:oozie:workflow:0.5">
  <global>
            <configuration>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                    <value>SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/</value>
                </property>
            </configuration>
  </global>

.....

 

View solution in original post

7 REPLIES 7

Contributor
For me, I had to add following Oozie workflow configuration:
oozie.launcher.yarn.app.mapreduce.am.env: SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

yes i know, they could have done better job than this.

Rising Star

Hi Ben, thanks a whole lot for your reply.

 

May I ask you where exactly you specified that setting?

 

- In the GUI, in some particular field?

- In "workflow.xml", in the Job's directory in HDFS? If yes: as an "arg", as a "property", or..?

- In "job.properties", in the Job's directory in HDFS? If yes: how?

- In some other file? E.g. "/etc/alternatives/spark-conf/spark-defaults.conf"? If yes, how?

 

A snippet of your code would be extremely appreciated!

 

I'm asking you because I've tried all of the above with your suggestion but I did not succeed.

 

Thanks again for your help

Contributor

Hi

either in your Hue Oozie workflow editor UI (workflow settings -> Hadoop Properties)

or on your workflow.xml

 

<workflow-app name="Workflow name" xmlns="uri:oozie:workflow:0.5">
  <global>
            <configuration>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                    <value>SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/</value>
                </property>
            </configuration>
  </global>

.....

 

View solution in original post

Rising Star

I got past this! Still no cigar, though. Now I have another error, but I'm going to work on this. It's something different now...

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, File file:/hdp01/yarn/nm/usercache/admin/appcache/application_1463068686660_0013/container_1463068686660_0013_01_000001/lib/test.py does not exist
java.io.FileNotFoundException: File file:/hdp01/yarn/nm/usercache/admin/appcache/application_1463068686660_0013/container_1463068686660_0013_01_000001/lib/test.py does not exist

Many thanks for your help. I'd never be able to figure this out by myself!

Explorer

I'm getting this error also.  Have you managed to solve it?

Rising Star

Hi aj,

 

yes I did manage to solve it. Please, take a look at the following thread and see if it can be of help. It may seem a bit unrelated from the "test.py not found" issue, but it contains detailed info about how to specify all the needed parameters to let the whole thing run smoothly:

 

 

http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-workflow-Spark-action-using-sim...

 

HTH

Explorer

Ah, my error was not using HDFS: for the .py.  Thanks!