Created on 02-09-2018 06:24 AM - edited 09-16-2022 05:50 AM
I followed the instructions, in the cloudera documentation site, to upgrade Spark 1.6 to Spark 2 in my cluster.
- downloaded and installed CSD on each node, master & data
- restarted scm server
- downloaded the parcel and deployed to the cluster
- in CM I added the service Spark 2 to the cluster
- added Spark gateway to all hosts, master and data
- ran python script to set Spark 2 as the default via 'alternatives'
- restarted the cluster
I use SBT and Eclipse to develop code, if I do a simple hello world, and load it via Hue and then use Oozie Spark workflow, the spark version is still Spark 1.6
I restarted the cluster but still get this i Hue > Oozie:
Child yarn jobs are found - Spark Version 1.6.0-cdh5.12.1 Spark Action Main class : org.apache.spark.deploy.SparkSubmit Oozie Spark action configuration
Is there a different place I should check to set Spark 2 on YARN?
Here is the error log:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org/apache/spark/sql/SparkSession java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784)
Created 02-12-2018 07:27 AM
I figured it out.
After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well.
In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie.
<property> <name>oozie.action.sharelib.for.spark</name> <value>spark2</value> </property>
Support says this is setup is not supported as of CDH 5.12 though.
Created 02-12-2018 07:27 AM
I figured it out.
After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well.
In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie.
<property> <name>oozie.action.sharelib.for.spark</name> <value>spark2</value> </property>
Support says this is setup is not supported as of CDH 5.12 though.
Created 09-13-2018 10:39 AM
Hi,
We are using CM 5.12.1, we use both spark on yarn and spark2, the developers are able to run oozie jobs using spark on yarn and they also see the UI interface on Hue under oozie editor. Now they wanted oozie jobs to run through spark2 also. When i perform the steps given by you to add spark2 to oozie, will we be able to see spark2 on the Hue interface under oozie editor and after doing so will developers be able to run spark2-submit using oozie.
Appreciate your help
Created 09-14-2018 07:29 AM
Sparamas,
It's is really cool you want to run both versions of Spark. I personally decided to only use Spark 2. The setup is tricky but I have written a few instructions for you Spark 2 via Hue Shell Workflow in my blog. If I get to it and don't forget to do so, I will post it in GitHub.
If you skip the part to setup spark_submit2 as the default with alternatives, which sounds that is your case, then you must define spark_submit2 inside the shell scripts via Hue workflow.
I hope this helps you,
J. Levi
Created 09-14-2018 10:48 AM
Thanks Levi. I'll try to define spark_submit2 inside the shell scripts via Hue workflow and see how it goes from there.
Created 09-18-2018 12:53 PM
Hi Levi,
Now they say that they want to move completely to spark2 for Oozie to run jobs. In that case the steps which you have given like adding the spark2 libraries and moving the jar files should work correct. And the property which you gave to add, do i need to configure it from Cloudera Manager or go to Oozie.xml file and add the property.
<property> <name>oozie.action.sharelib.for.spark</name> <value>spark2</value> </property>
Thanks
Created 09-19-2018 12:33 PM
Created 09-19-2018 02:52 PM
Thanks Levi
Created on 09-25-2018 10:51 AM - edited 09-25-2018 11:47 AM
I'm wondering if anyone can help with my issue. I followed the blog from Eric and I can now submit jobs through oozie with spark2 on yarn. However, when I try to write to a Hive table through spark I'm getting an error. In my session, I enable hive support:
sparkBuilder.enableHiveSupport()
I'm then trying to run an alter table via spark.sql:
transactions.sparkSession.sql(s"ALTER TABLE transactions DROP IF EXISTS PARTITION(OHREQUID_PART = ${r.getInt(0)})")
If I run this through spark2-submit, it works fine, but if I run through Ozzie I get the following error:
User class threw exception: java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Iface.get_all_functions()Lorg/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse; java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Iface.get_all_functions()Lorg/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse;
I've copied all of the spark jars from /opt/cloudera/parcels/SPARK2/lib/spark2/jars/ to /user/oozie/share/lib/lib_<slid>/spark2/, I've uplaoded the hive-site.xml to the same directory, also copied in the oozie-sharelib-spark.jar jar. chown'd all of the files to oozie:oozie. I also have oozie.action.sharelib.for.spark=spark2 set in my properties file. I also made sur ethe jars are showing up in oozie shareliblist spark2
It seems like a dependency collision to me, but I'm not sure which jar would be causing the issue.
Thanks for any insights
These are the jars I have loaded in the spark2 sharelib related to hive:
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-metastore-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-serde-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-0.23-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-common-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-scheduler-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-site.xml
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/spark-hive-exec_2.11-2.3.0.cloudera2.jar