02-09-2018 06:24 AM - edited 02-09-2018 06:33 AM
I followed the instructions, in the cloudera documentation site, to upgrade Spark 1.6 to Spark 2 in my cluster.
- downloaded and installed CSD on each node, master & data
- restarted scm server
- downloaded the parcel and deployed to the cluster
- in CM I added the service Spark 2 to the cluster
- added Spark gateway to all hosts, master and data
- ran python script to set Spark 2 as the default via 'alternatives'
- restarted the cluster
I use SBT and Eclipse to develop code, if I do a simple hello world, and load it via Hue and then use Oozie Spark workflow, the spark version is still Spark 1.6
I restarted the cluster but still get this i Hue > Oozie:
Child yarn jobs are found - Spark Version 1.6.0-cdh5.12.1 Spark Action Main class : org.apache.spark.deploy.SparkSubmit Oozie Spark action configuration
Is there a different place I should check to set Spark 2 on YARN?
Here is the error log:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org/apache/spark/sql/SparkSession java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784)
02-12-2018 07:27 AM
I figured it out.
After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well.
In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie.
<property> <name>oozie.action.sharelib.for.spark</name> <value>spark2</value> </property>
Support says this is setup is not supported as of CDH 5.12 though.