Member since
10-13-2017
7
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
12257 | 02-12-2018 07:27 AM |
09-19-2018
12:33 PM
1 Kudo
I followed the steps from Eric's Blog to customize Oozie with Spark 2. Best of luck with Spark 2!
... View more
09-14-2018
07:29 AM
1 Kudo
Sparamas, It's is really cool you want to run both versions of Spark. I personally decided to only use Spark 2. The setup is tricky but I have written a few instructions for you Spark 2 via Hue Shell Workflow in my blog. If I get to it and don't forget to do so, I will post it in GitHub. If you skip the part to setup spark_submit2 as the default with alternatives, which sounds that is your case, then you must define spark_submit2 inside the shell scripts via Hue workflow. I hope this helps you, J. Levi
... View more
08-16-2018
04:26 PM
This definitely was the issue and the fix for me. I saw both jar files already in the java/lib/security dir and failed to replace them with the downloaded UnlimitedJCEPolicyJDK8 jar files. It took me a long time to get back to replacing the files. After doing so, I restarted CMS successfuly. Thanks for the post!
... View more
02-12-2018
07:27 AM
1 Kudo
I figured it out. After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well. In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie. <property>
<name>oozie.action.sharelib.for.spark</name>
<value>spark2</value>
</property> Support says this is setup is not supported as of CDH 5.12 though.
... View more
02-12-2018
07:05 AM
I'm running Spark2 submit command line successfully as local and yarn cluster mode in CDH 5.12. I experience the same problem with saveAsTable when I run it in Hue Oozie workflow, given I loaded all Spark2 libraries to share/lib and pointed my workflow to that new dir. Since I have hundreds of tables, and some of them change structure over time, I am unable to declare Hive tables by hand. In command line, Spark autogenerates the Hive table, as parquet, if it does not exist. Append mode also works well, given I have not tried the insert feature. It is very tricky to run Spark2 cluster mode jobs. I made sure I entered first the spark-submit parameters first before my job arguments. See how I run the job below: $ spark-submit --version version 2.2.0.cloudera2 Using Scala version 2.11.8 ... $ pwd
/home/hg/app/spark/demo
$ ls -1
ojdbc8.jar
oracle_schema.properties
demo_2.11-0.1.0-SNAPSHOT.jar
$ spark-submit --verbose --class com.hernandezgraham.demo --master yarn --deploy-mode cluster --jars ojdbc8.jar --driver-class-path ojdbc8.jar --files oracle_schema.properties demo_2.11-0.1.0-SNAPSHOT.jar "argument1" "argument2" $ hdfs dfs -ls /user/hive/warehouse/demo drwxr-xr-x - hdfs hive 0 2018-02-12 09:39 /user/hive/warehouse/demo $ hive hive> show databases; OK default hive> show tables; OK demo hive> select * from demo limit 1; OK 1 hguser Time taken: 0.557 seconds, Fetched: 1 row(s) Even though Spark 2 executes my code successfully in Oozie workflows, it still does not write the file and the Hive table. Perhaps that is a bug fix in 5.12 for the command line. The documentation is not quite clear for Hue.
... View more
02-09-2018
06:24 AM
I followed the instructions, in the cloudera documentation site, to upgrade Spark 1.6 to Spark 2 in my cluster. - downloaded and installed CSD on each node, master & data - restarted scm server - downloaded the parcel and deployed to the cluster - in CM I added the service Spark 2 to the cluster - added Spark gateway to all hosts, master and data - ran python script to set Spark 2 as the default via 'alternatives' - restarted the cluster I use SBT and Eclipse to develop code, if I do a simple hello world, and load it via Hue and then use Oozie Spark workflow, the spark version is still Spark 1.6 I restarted the cluster but still get this i Hue > Oozie: Child yarn jobs are found -
Spark Version 1.6.0-cdh5.12.1
Spark Action Main class : org.apache.spark.deploy.SparkSubmit
Oozie Spark action configuration Is there a different place I should check to set Spark 2 on YARN? Here is the error log: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org/apache/spark/sql/SparkSession
java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
... View more
Labels: