Created on 04-16-2016 09:52 AM - edited 09-16-2022 03:14 AM
Hello, I'm using CDH 5.5.1 with Spark 1.5.0.
I'm unsuccessfully trying to execute a simple Spark action (Python script) via Oozie. As for now I just want to be able to run something at all, the script is still a silly example, it doesn't really do anything. It is as follows:
## IMOPORT FUNCTIONS from pyspark.sql.functions import * ## CREATE MAIN DATAFRAME eventi_DF = sqlContext.table("eventi")
I created a simple Oozie Workflow from Hue GUI. I used the following settings for the Spark action:
SPARK MASTER: yarn-cluster MODE: cluster APP NAME: MySpark JARS / PY FILES: lib/test.py MAIN CLASS: org.apache.spark.examples.mllib.JavaALS ARGUMENTS: <No Arguments Defined>
I've uploaded the Script in HDFS under the Workspace "/user/hue/oozie/workspaces/hue-oozie-1460736691.98/lib" directory, and I'm sure it gets picked up (as just to understand it was meant to be put in this directory I had to work a little bit, fighting a "test.py" not found" exception, that now is not there anymore).
As of now, when I try to run the Workflow by pressing the "Play" button on GUI, this is what I get in the Action Logfile:
>>> Invoking Spark class now >>> <<< Invocation of Main class completed <<< Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not found: SPARK_HOME java.util.NoSuchElementException: key not found: SPARK_HOME at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:943) at org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:942) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:942) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:630) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:124) at org.apache.spark.deploy.yarn.Client.run(Client.scala:914) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:973) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:185) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:176) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:378) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:296) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Oozie Launcher failed, finishing Hadoop job gracefully Oozie Launcher, uploading action data to HDFS sequence file: hdfs://mnethdp01.glistencube.com:8020/user/admin/oozie-oozi/0000000-160416120358569-oozie-oozi-W/spark-3ba6--spark/action-data.seq Oozie Launcher ends
Now, I guess the problem is:
Failing Oozie Launcher, ... key not found: SPARK_HOME
I've been trying hard to set this SPARK_HOME Key in different places. Things I've tried include the following:
Spark Service Environment Advanced Configuration Snippet (Safety Valve): SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
Oozie Service Environment Advanced Configuration Snippet (Safety Valve) SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
SPARK MASTER: local[*] SPARK MODE: client SPARK MASTER: yarn-cluster SPARK MODE: cluster SPARK MASTER: yarn-client SPARK MODE: client
spark.yarn.appMasterEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark spark.executorEnv.SPARK_HOME /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark
All the above to no success. Apparently I'm not able to set the required key anywhere.
What am I doing wrong? Isn't this meant to be pretty straightforward? Thanks in advance for any insights.