Reply
Expert Contributor
Posts: 71
Registered: ‎03-04-2015

How to inject local classpath of 3rd-party libs to Oozie Spark action?

We have a Spark app that uses CLAB Phoenix to access HBase tables.  It is working on command line and I am trying to set it up as Oozie action.  However, I am having trouble importing the class paths into Oozie using available Hue 3.9 GUI (CDH 5.7).

 

The previous related questions that I can find (such as this) as well as this blog post all suggest making physical copies of the library jars, and put them in HDFS (1) workflow/lib, or (2) Oozie sharelib dir.  However, the Phoenix package has 70+ files (~210MB), and is already installed on the entire cluster.  It seems inefficient and wasteful to upload all that into HDFS and swoosh them around the network unnecessarily.

 

With spark-submit, we can pass in the path using "spark.driver.extraClassPath" and "spark.executor.extraClassPath" .  However, according to OOZIE-2277, it's not possible with Oozie.  Setting them in <action><spark><configuration><property> just gets ignored:

 

Warning: Ignoring non-spark config property: "spark.executor.extraClassPath=/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/*:/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/lib/*:/opt/spark/lib/*"
Warning: Ignoring non-spark config property: "spark.driver.extraClassPath=/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/*:/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/lib/*:/opt/spark/lib/*"

 

The same log file shows that "spark.driver.extraClassPath" and "spark.executor.extraClassPath" are being populated, from what looks like Oozie sharelib contents.  Is there a way to add to it through environment variable or something?

 

Thanks,

Miles

 

Expert Contributor
Posts: 71
Registered: ‎03-04-2015

Re: How to inject local classpath of 3rd-party libs to Oozie Spark action?

Tried uploading the Phoenix jars to a separate HDFS location, then point oozie.libpath to it in workflow def.  Now it caused AM launching to fail:

 

2017-01-18 13:56:13,053 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.TaskLog.createLogSyncer()Ljava/util/concurrent/ScheduledExecutorService;
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:258)

 

Really prefer not to mess with Oozie sharelib - it seems effectively considered a part of CDH installation.  The blog didn't really explain how users should append 3rd-party content to it.  And Phoenix is only used by a subset of workflows anyway.

 

Could the problem be with Hue-Oozie integration?  Very confusing area - appreciate any tips anyone has!

 

Expert Contributor
Posts: 71
Registered: ‎03-04-2015

Re: How to inject local classpath of 3rd-party libs to Oozie Spark action?

When I copied the Phoenix client jar to workflow/lib directory, Oozie included it in the Spark container.  However, AppMaster now fails to launch:

 

(syslog)

2017-01-18 22:52:30,735 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.TaskLog.createLogSyncer()Ljava/util/concurrent/ScheduledExecutorService;
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:258)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:241)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1456)
2017-01-18 22:52:30,754 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1

 

This looks like OOZIE-2389.  Using the workaround suggested therein, I was able to launch the Spark task, but org.apache.spark.deploy.SparkSubmit.main() failed immediately with no info.

 

I used phoenix-4.7.0-clabs-phoenix1.3.0-client.jar, not *thin-client.jar which doesn't contain the org.apache.phoenix.spark driver.  Does it have any dependent jars that need to be copied along, or any version conflict with CDH 5.7.1?