Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Add CLASSPATH to Oozie workflow job

avatar
Contributor

I coded SparkSQL that accesses Hive tables, in Java, and packaged a jar file that can be run using spark-submit.

 

Now I want to run this jar as an Oozie workflow (and coordinator, if I make workflow to work). When I try to do that, the job fails and I get in Oozie job logs

 

java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf

What I did was to look for the jar in $HIVE_HOME/lib that contains that class, copy that jar in the lib path of my Oozie workflow root path and add this to workflow.xml in the Spark Action:

 

<spark-opts> --jars lib/*.jar</spark-opts>

 

But this leads to another java.lang.NoClassDefFoundError that points to another missing class, so I did the process again of looking for the jar and copying, run the job and the same thing goes all over. It looks like it needs the dependency to many jars in my Hive lib.

What I don't understand is when I use spark-submit in the shell using the jar, it runs OK, I can SELECT and INSERT into my Hive tables. It is only when I use Oozie that this occurs. It looks like that Spark can't see the Hive libraries anymore when contained in an Oozie workflow job. Can someone explain how this happens?

How do I add or reference the necessary classes / jars to the Oozie path?

 

I am using Cloudera Quickstart VM CDH 5.4.0, Spark 1.4.0, Oozie 4.1.0.

4 REPLIES 4

avatar
Mentor
Unlike your spark shell command, Oozie does not invoke/use scripts that setup local classpaths for its actions (as it needs to use distributed-caches for this).

Take a look at how the ShareLib works, and how you can override them for your action to include a system one http://archive.cloudera.com/cdh5/cdh/5/oozie/WorkflowFunctionalSpec.html#a17_HDFS_Share_Libraries_fo... In your case, if you use the java action, you can make it include the "hive" share-lib, and that will include all Hive jars into the distributed cache classpath.

avatar
Rising Star

 Hi Harsh,

 

Actually I'm still a little confued about the 4 ways mentioned in "One Last Thing" in http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/
I tried all the ways, but all didn't work.(I'm using CDH Hue and oozie workflow) Following were what I tried with the 4 ways:
 
For way 1:
It recommended "oozie.libpath=/path/to/jars,another/path/to/jars"
I add oozie.libpath=hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark
or
oozie.libpath=hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark/guava-16.0.1.jar
and oozie.use.system.libpath=true is by default.
Both don't work.
 
For way 2:
I added guava-16.0.1.jar into “lib” next to current workspace workflow.xml in HDFS, it doesn't work.
 
For way 3:
I can not find any <archive> tag in a Spark action with the path to a single jar, so I have no way to try way3.
 
For way 4:
I added guava-16.0.1.jar to the ShareLib (e.g. hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark) and set oozie.use.system.libpath=true in job.properties, it still doesn't work.
 
Could you please give any suggestion? Thanks very much for any of your help! I appreciated!

avatar
New Contributor

Hi,

 

Can you please please suggest, how to add a python script wrapped in a bash script which is located in some other hdfs /apps directory ?

 

Regards,

 Murari

avatar
Rising Star

Hi jdb, how did you add external jar into oozie, and what configuration you changed to make oozie classpath pick up the new jar? Thanks!