Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark 2 on YARN via Oozie Hue wrong version

avatar
Explorer

I followed the instructions, in the cloudera documentation site, to upgrade Spark 1.6 to Spark 2 in my cluster.

 

- downloaded and installed CSD on each node, master & data

- restarted scm server

- downloaded the parcel and deployed to the cluster

- in CM I added the service Spark 2 to the cluster

- added Spark gateway to all hosts, master and data

- ran python script to set Spark 2 as the default via 'alternatives'

- restarted the cluster

 

I use SBT and Eclipse to develop code, if I do a simple hello world,  and load it via Hue and then use Oozie Spark workflow, the spark version is still Spark 1.6

 

 

I restarted the cluster but still get this i Hue > Oozie:

 

Child yarn jobs are found - 
Spark Version 1.6.0-cdh5.12.1
Spark Action Main class        : org.apache.spark.deploy.SparkSubmit

Oozie Spark action configuration

  

Is there a different place I should check to set Spark 2 on YARN?

 

Here is the error log:

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org/apache/spark/sql/SparkSession
java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)

 

1 ACCEPTED SOLUTION

avatar
Explorer

I figured it out. 

 

After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well.

 

In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie.

 

<property>
    <name>oozie.action.sharelib.for.spark</name>
    <value>spark2</value>
</property>

 

Support says this is setup is not supported as of CDH 5.12 though.

 

View solution in original post

8 REPLIES 8

avatar
Explorer

I figured it out. 

 

After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well.

 

In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie.

 

<property>
    <name>oozie.action.sharelib.for.spark</name>
    <value>spark2</value>
</property>

 

Support says this is setup is not supported as of CDH 5.12 though.

 

avatar
Explorer

Hi,

 

We are using CM 5.12.1, we use both spark on yarn and spark2, the developers are able to run oozie jobs using spark on yarn and they also see the UI interface on Hue under oozie editor. Now they wanted oozie jobs to run through spark2 also. When i perform the steps given by you to add spark2 to oozie, will we be able to see spark2 on the Hue interface under oozie editor and after doing so will developers be able to run spark2-submit using oozie.

 

Appreciate your help

avatar
Explorer

Sparamas,

 

It's is really cool you want to run both versions of Spark. I personally decided to only use Spark 2. The setup is tricky but I have written a few instructions for you Spark 2 via Hue Shell Workflow in my blog. If I get to it and don't forget to do so, I will post it in GitHub.

 

If you skip the part to setup spark_submit2 as the default with alternatives, which sounds that is your case, then you must define spark_submit2 inside the shell scripts via Hue workflow.

 

I hope this helps you,

 

J. Levi

avatar
Explorer

Thanks Levi. I'll try to define spark_submit2 inside the shell scripts via Hue workflow and see how it goes from there.

avatar
Explorer

Hi Levi,

 

Now they say that they want to move completely to spark2 for Oozie to run jobs. In that case the steps which you have given like adding the spark2 libraries and moving the jar files should work correct. And the property which you gave to add, do i need to configure it from Cloudera Manager or go to Oozie.xml file and add the property.

<property>
    <name>oozie.action.sharelib.for.spark</name>
    <value>spark2</value>
</property>

 Thanks

avatar
Explorer

 

I followed the steps from Eric's Blog to customize Oozie with Spark 2.

 

Best of luck with Spark 2!

avatar
Explorer

Thanks Levi

avatar
New Contributor

I'm wondering if anyone can help with my issue.  I followed the blog from Eric and I can now submit jobs through oozie with spark2 on yarn.  However, when I try to write to a Hive table through spark I'm getting an error.  In my session, I enable hive support:

sparkBuilder.enableHiveSupport()

 

I'm then trying to run an alter table via spark.sql: 

transactions.sparkSession.sql(s"ALTER TABLE transactions DROP IF EXISTS PARTITION(OHREQUID_PART = ${r.getInt(0)})")

 

 

If I run this through spark2-submit, it works fine, but if I run through Ozzie I get the following error:

 

User class threw exception: java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Iface.get_all_functions()Lorg/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse;
java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Iface.get_all_functions()Lorg/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse;

I've copied all of the spark jars from /opt/cloudera/parcels/SPARK2/lib/spark2/jars/ to /user/oozie/share/lib/lib_<slid>/spark2/, I've uplaoded the hive-site.xml to the same directory, also copied in the oozie-sharelib-spark.jar jar.  chown'd all of the files to oozie:oozie.  I also have oozie.action.sharelib.for.spark=spark2 set in my properties file.  I also made sur ethe jars are showing up in oozie shareliblist spark2

 

It seems like a dependency collision to me, but I'm not sure which jar would be causing the issue.

 

Thanks for any insights

 

 

 

These are the jars I have loaded in the spark2 sharelib related to hive:

 

hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-metastore-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-serde-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-0.23-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-common-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-shims-scheduler-1.1.0-cdh5.13.3.jar
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/hive-site.xml
hdfs://nameservice1/user/oozie/share/lib/lib_20180613213413/spark2/spark-hive-exec_2.11-2.3.0.cloudera2.jar