Support Questions

FrozenWave · ‎04-16-2016

Hello, I'm using CDH 5.5.1 with Spark 1.5.0.

I'm unsuccessfully trying to execute a simple Spark action (Python script) via Oozie. As for now I just want to be able to run something at all, the script is still a silly example, it doesn't really do anything. It is as follows:

## IMOPORT FUNCTIONS
from pyspark.sql.functions import *

## CREATE MAIN DATAFRAME
eventi_DF = sqlContext.table("eventi")

I created a simple Oozie Workflow from Hue GUI. I used the following settings for the Spark action:

SPARK MASTER: yarn-cluster
MODE: cluster
APP NAME: MySpark
JARS / PY FILES: lib/test.py
MAIN CLASS: org.apache.spark.examples.mllib.JavaALS
ARGUMENTS: <No Arguments Defined>

I've uploaded the Script in HDFS under the Workspace "/user/hue/oozie/workspaces/hue-oozie-1460736691.98/lib" directory, and I'm sure it gets picked up (as just to understand it was meant to be put in this directory I had to work a little bit, fighting a "test.py" not found" exception, that now is not there anymore).

As of now, when I try to run the Workflow by pressing the "Play" button on GUI, this is what I get in the Action Logfile:

>>> Invoking Spark class now >>>


<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not found: SPARK_HOME
java.util.NoSuchElementException: key not found: SPARK_HOME
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:58)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:58)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:943)
	at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:942)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:942)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:630)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:124)
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:914)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:973)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:185)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:176)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:46)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:378)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:296)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
	at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://mnethdp01.glistencube.com:8020/user/admin/oozie-oozi/0000000-160416120358569-oozie-oozi-W/spark-3ba6--spark/action-data.seq

Oozie Launcher ends

Now, I guess the problem is:

Failing Oozie Launcher, ... key not found: SPARK_HOME

I've been trying hard to set this SPARK_HOME Key in different places. Things I've tried include the following:

Modified Spark Config in Cloudera Manager and reloaded the Configuration:

Spark Service Environment Advanced Configuration Snippet (Safety Valve):
SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark

Modified Oozie Config in Cloudera Manager and reloaded the Configuration:

Oozie Service Environment Advanced Configuration Snippet (Safety Valve)
SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark

Used different ways of invoking the Spark Action inside the Spark Action Definition:

SPARK MASTER: local[*]
SPARK MODE: client

SPARK MASTER: yarn-cluster
SPARK MODE: cluster

SPARK MASTER: yarn-client
SPARK MODE: client

Modified the "/etc/alternatives/spark-conf/spark-defaults.conf" File manually, adding the following inside it (I did it just on one node, by the way):

spark.yarn.appMasterEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
spark.executorEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark

All the above to no success. Apparently I'm not able to set the required key anywhere.

What am I doing wrong? Isn't this meant to be pretty straightforward? Thanks in advance for any insights.

ben123 · ‎05-13-2016

Hi

either in your Hue Oozie workflow editor UI (workflow settings -> Hadoop Properties)

or on your workflow.xml

<workflow-app name="Workflow name" xmlns="uri:oozie:workflow:0.5">
<global>
            <configuration>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                    <value>SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/</value>
                </property>
            </configuration>
</global>

.....

View solution in original post

ben123 · ‎05-12-2016

For me, I had to add following Oozie workflow configuration:
oozie.launcher.yarn.app.mapreduce.am.env: SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

yes i know, they could have done better job than this.

FrozenWave · ‎05-13-2016

Hi Ben, thanks a whole lot for your reply.

May I ask you where exactly you specified that setting?

- In the GUI, in some particular field?

- In "workflow.xml", in the Job's directory in HDFS? If yes: as an "arg", as a "property", or..?

- In "job.properties", in the Job's directory in HDFS? If yes: how?

- In some other file? E.g. "/etc/alternatives/spark-conf/spark-defaults.conf"? If yes, how?

A snippet of your code would be extremely appreciated!

I'm asking you because I've tried all of the above with your suggestion but I did not succeed.

Thanks again for your help

ben123 · ‎05-13-2016

Hi

either in your Hue Oozie workflow editor UI (workflow settings -> Hadoop Properties)

or on your workflow.xml

<workflow-app name="Workflow name" xmlns="uri:oozie:workflow:0.5">
<global>
            <configuration>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                    <value>SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/</value>
                </property>
            </configuration>
</global>

.....

FrozenWave · ‎05-13-2016

I got past this! Still no cigar, though. Now I have another error, but I'm going to work on this. It's something different now...

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, File file:/hdp01/yarn/nm/usercache/admin/appcache/application_1463068686660_0013/container_1463068686660_0013_01_000001/lib/test.py does not exist
java.io.FileNotFoundException: File file:/hdp01/yarn/nm/usercache/admin/appcache/application_1463068686660_0013/container_1463068686660_0013_01_000001/lib/test.py does not exist

Many thanks for your help. I'd never be able to figure this out by myself!

aj · ‎10-17-2016

I'm getting this error also. Have you managed to solve it?

FrozenWave · ‎10-17-2016

Hi aj,

yes I did manage to solve it. Please, take a look at the following thread and see if it can be of help. It may seem a bit unrelated from the "test.py not found" issue, but it contains detailed info about how to specify all the needed parameters to let the whole thing run smoothly:

http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-workflow-Spark-action-using-sim...

HTH

aj · ‎10-17-2016

Ah, my error was not using HDFS: for the .py. Thanks!

Cloudera Community

Support Questions

"Failing Oozie Launcher - key not found: SPARK_HOME" running Spark (Python script) in Oozie Workflow