Created 01-13-2017 08:50 AM
I'm trying to run the SparkPi example using the example jar in Spark2 and running it through Oozie. Attached are the different configuration files for Oozie:
I've the below directory structure both on local FS and HDFS:
+-~/sparkAction/
+-job.properties
+-workflow.xml
+-lib/
+-spark-examples_2.11-2.0.0.2.5.3.0-37.jar
+-spark-hdp-assembly.jar
When I run this using this command as the yarn user :
oozie job -oozie http://kvs-in-merlin04.int.kronos.com:11000/oozie -config job.properties -run
I'm getting the below error:
java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$ at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:559) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more
The Oozie launcher successfully starts the SparkPi on yarn so that means there are no permission issues. But the Spark program is not finding the SparkSession class!!!
Please help...
Created 01-14-2017 08:06 PM
I am not sure if Spark2 is supported via Oozie yet but let's say it does, did you add the spark 2 libraries to Oozie sharelib? That will be your first step.
Created 01-14-2017 08:06 PM
I am not sure if Spark2 is supported via Oozie yet but let's say it does, did you add the spark 2 libraries to Oozie sharelib? That will be your first step.
Created 01-15-2017 09:14 AM
Hi Artem,
Yes i created a new directory under HDFS and included it in the Oozie libpath as below :
oozie.libpath=/user/oozie/share/lib/spark2
I included all the jars from under this Spark2 installation directory /usr/hdp/2.5.3.0-37/spark2/jars to the above HDFS directory but still it gives me this error:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1484116726997_0144 finished with failed status org.apache.spark.SparkException: Application application_1484116726997_0144 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:289) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:211) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:51) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:242) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Any ideas about what might be causing this error?
Created 10-10-2017 08:43 AM
You can see more info from YARN logs
yarn logs -applicationId application_1484116726997_0144
Created 01-15-2017 03:15 PM
Looks like from comments on the following Jira, Spark2 support will arrive with Oozie 5.0 https://issues.apache.org/jira/plugins/servlet/mobile#issue/OOZIE-2767
Created 02-13-2017 02:28 PM
@Shikhar Agarwal Spark2 is not officially supported in HDP via Oozie and it is not implemented in Apache Oozie either. Please consider accepting this answer to close the thread. Sorry it's not much of help here.
Created 02-13-2017 04:29 PM
Thanks Artem
Created 06-16-2017 01:59 PM
Hi Artem, do you have a Hortonworks link stating that Spark2 is not officially supported in HDP via Oozie? I want to implement Spark2 via Oozie, and using HDP 2.6, and it seems from this doc that Spark2 via Oozie (oozie 4.2 in hdp2.6) IS possible. Perhaps the poster didn't copy some libraries or jars to the spark2 sharelib? (again, see link).