Member since
01-23-2017
114
Posts
19
Kudos Received
4
Solutions
05-23-2018
02:29 PM
2 Kudos
This article discuss the process related to Oozie Manual Sharelib update and the prerequisites for Spark Oozie Sharelib Copy the sharelib to a local directory
# mkdir oozie_share_lib
# hadoop fs -copyToLocal <current-share-lib-directory> oozie_share_lib/lib To update oozie sharelib once the existing oozie sharelib copied from HDFS to local as above: /usr/hdp/current/oozie-client/bin/oozie-setup.sh sharelib create -fs /user/oozie/share/lib/ -locallib oozie_share_lib/ This will create a new sharelib including SPARK Oozie sharelib: the destination path for sharelib is: /user/oozie/share/lib/lib_20180502070613
Fixing oozie spark sharelib
Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark
Renaming spark to spark_orig in /user/oozie/share/lib/lib_20180502070613
Creating new spark directory in /user/oozie/share/lib/lib_20180502070613
Copying Oozie spark sharelib jar to /user/oozie/share/lib/lib_20180502070613/spark
Copying oozie_share_lib/lib/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark libraries to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark python libraries to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark hive site to /user/oozie/share/lib/lib_20180502070613/spark But from the corresponding HDFS folder we can see that the spark lib's were not added to the SPARK Oozie share lib: $ hadoop fs -ls /user/oozie/share/lib/lib_20180502070613/spark
Found 1 items
-rwxrwxrwx 3 oozie hadoop 191121639 2018-05-02 07:18 /user/oozie/share/lib/lib_20180502070613/spark/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar It means Oozie Sharelib update is not working as expected for SPARK, even though it shows Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark But the spark client was not installed on the node from where oozie sharelib update command was run no-spark-client-installed.png And from the node where the SPARK-CLIENT installed OOZIE Sharelib update does properly update the Spark Oozie Share Lib: the destination path for sharelib is: /user/oozie/share/lib/lib_20180502064112
Fixing oozie spark sharelib
Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark
Renaming spark to spark_orig in /user/oozie/share/lib/lib_20180502064112
Creating new spark directory in /user/oozie/share/lib/lib_20180502064112
Copying Oozie spark sharelib jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying oozie-new-sharelib/lib/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying local spark libraries to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-examples-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-core-3.2.10.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-hdp-assembly.jar
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-rdbms-3.2.9.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-api-jdo-3.2.6.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying local spark python libraries to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/pyspark.zip to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/py4j-0.9-src.zip to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/PY4J_LICENSE.txt
Copying local spark hive site to /user/oozie/share/lib/lib_20180502064112/spark
Copying /etc/spark/conf/hive-site.xml to /user/oozie/share/lib/lib_20180502064112/spark From here we can see that Oozie is able to pick up the files from /usr/hdp/2.6.3.0-235/spark/conf/ to HDFS /user/oozie/share/lib/lib_20180502064112/spark where we have the spark-client installed spark-client-installed.png $ hadoop fs -ls /user/oozie/share/lib/lib_20180502064112/spark
Found 8 items
-rw-r--r-- 3 oozie hdfs 339666 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-api-jdo-3.2.6.jar
-rw-r--r-- 3 oozie hdfs 1890075 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-core-3.2.10.jar
-rw-r--r-- 3 oozie hdfs 1809447 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-rdbms-3.2.9.jar
-rw-r--r-- 3 oozie hdfs 1918 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/hive-site.xml
-rw-r--r-- 3 oozie hdfs 23278 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar
-rw-r--r-- 3 oozie hdfs 44846 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/py4j-0.9-src.zip
-rw-r--r-- 3 oozie hdfs 358253 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/pyspark.zip
-rw-r--r-- 3 oozie hdfs 191121639 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar With this, to have properly updated Spark Oozie share lib we need to have Spark client to be installed from the node/server where we are running the Oozie Share lib update manually.
... View more
Labels: