Support Questions
Find answers, ask questions, and share your expertise

Who agreed with this topic

"Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in Oozie Workflow

Rising Star

Hello, I'm using CDH 5.5.1 with Spark 1.5.0.


I'm unsuccessfully trying to execute a simple Spark action (Python script) via Oozie. As for now I just want to be able to run something at all, the script is still a silly example, it doesn't really do anything. It is as follows:


from pyspark.sql.functions import *

eventi_DF = sqlContext.table("eventi")


I created a simple Oozie Workflow from Hue GUI. I used the following settings for the Spark action:


SPARK MASTER: yarn-cluster
MODE: cluster
MAIN CLASS: org.apache.spark.examples.mllib.JavaALS
ARGUMENTS: <No Arguments Defined>

I've uploaded the Script in HDFS under the Workspace "/user/hue/oozie/workspaces/hue-oozie-1460736691.98/lib" directory, and I'm sure it gets picked up (as just to understand it was meant to be put in this directory I had to work a little bit, fighting a "" not found" exception, that now is not there anymore).


As of now, when I try to run the Workflow by pressing the "Play" button on GUI, this is what I get in the Action Logfile:


>>> Invoking Spark class now >>>

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not found: SPARK_HOME
java.util.NoSuchElementException: key not found: SPARK_HOME
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:58)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:58)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:943)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:942)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:942)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:630)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:124)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:973)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(
	at org.apache.oozie.action.hadoop.SparkMain.main(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.apache.hadoop.mapred.MapTask.runOldMapper(
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$
	at java.util.concurrent.Executors$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://

Oozie Launcher ends

Now, I guess the problem is:


Failing Oozie Launcher, ... key not found: SPARK_HOME

I've been trying hard to set this SPARK_HOME Key in different places. Things I've tried include the following:


  • Modified Spark Config in Cloudera Manager and reloaded the Configuration:
Spark Service Environment Advanced Configuration Snippet (Safety Valve):
  • Modified Oozie Config in Cloudera Manager and reloaded the Configuration:


Oozie Service Environment Advanced Configuration Snippet (Safety Valve)
  • Used different ways of invoking the Spark Action inside the Spark Action Definition:


SPARK MASTER: local[*]
SPARK MODE: client

SPARK MASTER: yarn-cluster
SPARK MODE: cluster

SPARK MASTER: yarn-client
SPARK MODE: client
  • Modified the "/etc/alternatives/spark-conf/spark-defaults.conf" File manually, adding the following inside it (I did it just on one node, by the way):


spark.yarn.appMasterEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
spark.executorEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark

All the above to no success. Apparently I'm not able to set the required key anywhere.


What am I doing wrong? Isn't this meant to be pretty straightforward? Thanks in advance for any insights.







Who agreed with this topic