About jhernandez

jhernandez · ‎09-19-2018

I followed the steps from Eric's Blog to customize Oozie with Spark 2. Best of luck with Spark 2!

jhernandez · ‎09-14-2018

Sparamas, It's is really cool you want to run both versions of Spark. I personally decided to only use Spark 2. The setup is tricky but I have written a few instructions for you Spark 2 via Hue Shell Workflow in my blog. If I get to it and don't forget to do so, I will post it in GitHub. If you skip the part to setup spark_submit2 as the default with alternatives, which sounds that is your case, then you must define spark_submit2 inside the shell scripts via Hue workflow. I hope this helps you, J. Levi

jhernandez · ‎08-16-2018

This definitely was the issue and the fix for me. I saw both jar files already in the java/lib/security dir and failed to replace them with the downloaded UnlimitedJCEPolicyJDK8 jar files. It took me a long time to get back to replacing the files. After doing so, I restarted CMS successfuly. Thanks for the post!

jhernandez · ‎02-12-2018

I figured it out. After upgrading Spark2 I had to copy all the jar files to a new hdfs dir /user/oozie/share/lib/spark2 and include the oozie-sharelib-spark jar as well. In the workflow, I had to set a conf parameter with value spark2 (which is the new dir containing all spark2 jars) and updated the share lib in oozie. <property> <name>oozie.action.sharelib.for.spark</name> <value>spark2</value> </property> Support says this is setup is not supported as of CDH 5.12 though.

jhernandez · ‎02-12-2018

I'm running Spark2 submit command line successfully as local and yarn cluster mode in CDH 5.12. I experience the same problem with saveAsTable when I run it in Hue Oozie workflow, given I loaded all Spark2 libraries to share/lib and pointed my workflow to that new dir. Since I have hundreds of tables, and some of them change structure over time, I am unable to declare Hive tables by hand. In command line, Spark autogenerates the Hive table, as parquet, if it does not exist. Append mode also works well, given I have not tried the insert feature. It is very tricky to run Spark2 cluster mode jobs. I made sure I entered first the spark-submit parameters first before my job arguments. See how I run the job below: $ spark-submit --version version 2.2.0.cloudera2 Using Scala version 2.11.8 ... $ pwd /home/hg/app/spark/demo $ ls -1 ojdbc8.jar oracle_schema.properties demo_2.11-0.1.0-SNAPSHOT.jar $ spark-submit --verbose --class com.hernandezgraham.demo --master yarn --deploy-mode cluster --jars ojdbc8.jar --driver-class-path ojdbc8.jar --files oracle_schema.properties demo_2.11-0.1.0-SNAPSHOT.jar "argument1" "argument2" $ hdfs dfs -ls /user/hive/warehouse/demo drwxr-xr-x - hdfs hive 0 2018-02-12 09:39 /user/hive/warehouse/demo $ hive hive> show databases; OK default hive> show tables; OK demo hive> select * from demo limit 1; OK 1 hguser Time taken: 0.557 seconds, Fetched: 1 row(s) Even though Spark 2 executes my code successfully in Oozie workflows, it still does not write the file and the Hive table. Perhaps that is a bug fix in 5.12 for the command line. The documentation is not quite clear for Hue.

jhernandez · ‎02-09-2018

I followed the instructions, in the cloudera documentation site, to upgrade Spark 1.6 to Spark 2 in my cluster. - downloaded and installed CSD on each node, master & data - restarted scm server - downloaded the parcel and deployed to the cluster - in CM I added the service Spark 2 to the cluster - added Spark gateway to all hosts, master and data - ran python script to set Spark 2 as the default via 'alternatives' - restarted the cluster I use SBT and Eclipse to develop code, if I do a simple hello world, and load it via Hue and then use Oozie Spark workflow, the spark version is still Spark 1.6 I restarted the cluster but still get this i Hue > Oozie: Child yarn jobs are found - Spark Version 1.6.0-cdh5.12.1 Spark Action Main class : org.apache.spark.deploy.SparkSubmit Oozie Spark action configuration Is there a different place I should check to set Spark 2 on YARN? Here is the error log: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org/apache/spark/sql/SparkSession java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784)

Online	Offline
Last Visited	‎08-14-2019 04:18 PM

Member Since	‎10-13-2017 12:45 PM
Last Visited	‎08-14-2019 04:18 PM
Posts	7
Kudos received	3

Cloudera Community

Re: Spark 2 on YARN via Oozie Hue wrong version

Re: Spark 2 on YARN via Oozie Hue wrong version

Re: Spark 2 on YARN via Oozie Hue wrong version

Re: Can't start Activity Monitor and Zookeeper aft...

Re: Spark 2 on YARN via Oozie Hue wrong version

Re: Spark 2 Can't write dataframe to parquet table

Spark 2 on YARN via Oozie Hue wrong version