Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to add external guava-16.0.1.jar in CDH oozie classpath

Highlighted

how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

Dear,

 

We used CDH5.5.0 Hue Oozie to run Spark action, and in the spark job action, the job used spark-cassandra-connector_2.10-1.5.0-M2.jar.  

 

The job can run successfully with spark-submit command, but it was failed to run in CDH oozie.

 

It said CDH oozie can not find guava-16.0.1.jar(guava-16.0.1.jar is an dependency of DSE4.8.3 Cassandra) in CDH5.5.0. Do you know how to add external guava-16.0.1.jar in CDH oozie classpath? Thanks!

-----

 

>>> Invoking Spark class now >>>

 

 

<<< Invocation of Main class completed <<<

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, com.google.common.reflect.TypeToken.isPrimitive()Z

java.lang.NoSuchMethodError: com.google.common.reflect.TypeToken.isPrimitive()Z

at com.datastax.driver.core.TypeCodec.<init>(TypeCodec.java:142)

at com.datastax.driver.core.TypeCodec.<init>(TypeCodec.java:136)

at com.datastax.driver.core.TypeCodec$BlobCodec.<init>(TypeCodec.java:609)

at com.datastax.driver.core.TypeCodec$BlobCodec.<clinit>(TypeCodec.java:606)

at com.datastax.driver.core.CodecRegistry.<clinit>(CodecRegistry.java:147)

at com.datastax.driver.core.Configuration$Builder.build(Configuration.java:259)

at com.datastax.driver.core.Cluster$Builder.getConfiguration(Cluster.java:1135)

at com.datastax.driver.core.Cluster.<init>(Cluster.java:111)

at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:178)

at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1152)

at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:85)

at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)

at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)

at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)

at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)

at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)

at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)

at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)

at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:120)

at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:241)

at com.datastax.spark.connector.writer.TableWriter$.apply(TableWriter.scala:263)

at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:36)

at TestCassandra$.main(TestCassandra.scala:44)

at TestCassandra.main(TestCassandra.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 

-----

 

 

 

 

19 REPLIES 19

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

I also added guava-16.0.1.jar into HDFS /user/oozie/share/lib/lib_20151201085935/spark dir, and chown as "oozie:oozie", chmod 777, but it still could not find the jar.  

 

Here are the job.properties and workflow.xml shown in CDH 5.5.0 Hue workflow.

 

job.properties:

oozie.use.system.libpath=True

security_enabled=False

dryrun=False

jobTracker=ip-10-0-4-248.us-west-1.compute.internal:8032

nameNode=hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020

 

workflow.xml:

<workflow-app name="sparktest-cassandra" xmlns="uri:oozie:workflow:0.5">

    <start to="spark-b23b"/>

    <kill name="Kill">

        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <action name="spark-b23b">

        <spark xmlns="uri:oozie:spark-action:0.1">

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <master>local[4]</master>

            <mode>client</mode>

            <name>sparktest-cassandra</name>

              <class>TestCassandra</class>

            <jar>lib/sparktest.jar</jar>

              <spark-opts>--driver-class-path /opt/cloudera/parcels/CDH/jars/guava-16.0.1.jar --jars lib/*.jar</spark-opts>

              <arg>s3n://gridx-output/sparktest/ </arg>

              <arg>10</arg>

              <arg>3</arg>

              <arg>2</arg>

        </spark>

        <ok to="End"/>

        <error to="Kill"/>

    </action>

    <end name="End"/>

</workflow-app>

 

Appreciated if any help!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

The job can run successfully with spark-submit command as following, but it was failed to run in CDH oozie.

 

spark-submit --master local[4] --class TestCassandra --jars /tmp/zlp1/cassandra-driver-core-2.2.0-rc3.jar,/tmp/zlp1/spark-cassandra-connector_2.10-1.5.0-M2.jar,/tmp/zlp1/jsr166e-1.1.0.jar --driver-class-path /opt/cloudera/parcels/CDH/jars/guava-16.0.1.jar sparktest.jar s3n://gridx-output/sparktest/ 10 3 2

 

 

Highly appreciated if someone can help!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

This issue in CDH has blocked us for a long time, can you help us out ASAP? Thanks!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

I tried the way 1, 2, 4 introduced in http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/, but still doesn't work.

 

Can you give an detailed example with commands steps?  Thanks!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

Actually I'm still a little confued about the 4 ways mentioned in "One Last Thing" in http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/

I tried all the ways, but all didn't work.(I'm using CDH Hue and oozie workflow) Following were what I tried with the 4 ways:

 

For way 1:

It recommended "oozie.libpath=/path/to/jars,another/path/to/jars"

I add oozie.libpath=hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark or oozie.libpath=hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark/guava-16.0.1.jar

and oozie.use.system.libpath=true is by default.

Both don't work.

 

For way 2:

I added guava-16.0.1.jar into “lib” next to current workspace workflow.xml in HDFS, it doesn't work.

 

For way 3:

I can not find any <archive> tag in a Spark action with the path to a single jar, so I have no way to try way3.

 

For way 4:

I added guava-16.0.1.jar to the ShareLib (e.g. hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark) and set oozie.use.system.libpath=true in job.properties, it still doesn't work.

 

Could you please give any suggestion? Thanks very much for any of your help! I appreciated!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

Does anybody know how to add external jar inro CDH oozie classpath for spark action?

Can anyone help? Appreciated!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Contributor

Hi,

 

I think you need to re-read those 4 ways a little more carefully.  #1 clearly states:

There is no need to ever point [oozie.libpath] at the ShareLib location. (I see that in a lot of workflows.) Oozie knows where the ShareLib is and will include it automatically if you set oozie.use.system.libpath=true in job.properties.

which is exactly what you tried.  

 

Any of the 4 methods should work.  (Except for #3: we're currently aware of a known issue where the Spark Action does not allow <file> or <archive> tags; we're planning on fixing that in a later release.)

 

Though keeping in mind that you're trying to replace a jar in the Sharelib, you need to go with #4 anyway, and replace the jar in the Spark Sharelib subdir (which you did already).  

Though remember that this means any Spark Action that anyone runs will have this modification.  If you want to protect other users and workflows from your changes, you can create a new dir in the Sharelib, say "spark_guava_16", and set "oozie.action.sharelib.for.spark" to "spark_guava_16".  This is also described in the blog post in the "Overriding the ShareLib" section.  If that's not a concern, then you don't need to bother.

 

Please check the following:

  1. Run the oozie admin -shareliblist spark command.  It will print out a list of the jars from the Spark sharelib directory that Oozie is currently aware of and using.  If you did replace the guava 14 jar with the 16 jar there, it should show up in that output.  If not, you need to restart the Oozie server or run the oozie admin -sharelibupdate command.  Also pay attention to the lib_<timestamp> directory in the output; perhaps you're changing an old directory
  2. When you run the job, look at the stdout from the Launcher Job.  It prints out a lot of useful information, including the classpath.  Do you see the guava 16 or 14 jar there?

 

However, as I said in the email thread in the oozie mailing list, Spark is expecting guava 14.  Guava tends to not be very compatible across major versions; so you may encounter other problems if you force it to use guava 16.

Software Engineer | Cloudera, Inc. | http://cloudera.com

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

Thanks Robert for your quick response! Appreciated.

 

1. I also tried #1 only set oozie.use.system.libpath=true in job.properties(not set oozie.libpath), it still didn't work.

 

2.I ran following command:

oozie admin -shareliblist spark -oozie http://10.0.4.248:11000/oozie

 

And the output showed that it had and only has guava-16.0.1.jar(not guava-14.0.1.jar) in the command output.(see following)

 

hdfs://ip-10-0-4-248.us-west-1.compute.internal:8020/user/oozie/share/lib/lib_20151201085935/spark/guava-16.0.1.jar

 

And I do restart oozie after replacing guava-16.0.1.jar

 

But it still reports following NoSuchMethodError:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, com.google.common.reflect.TypeToken.isPrimitive()Z
java.lang.NoSuchMethodError: com.google.common.reflect.TypeToken.isPrimitive()Z
	at com.datastax.driver.core.TypeCodec.<init>(TypeCodec.java:142)
	at com.datastax.driver.core.TypeCodec.<init>(TypeCodec.java:136)

 

3.There is only one lib_<timestamp> directory(which is exactly lib_20151201085935) under HDFS /user/oozie/share/lib dir.

 

Can you help? Thanks!

Re: how to add external guava-16.0.1.jar in CDH oozie classpath

Explorer

Hi Robert,

 

To make guava verison consistant between spark and cassandra, I tried to make both use guava-16.0.1.jar, I also tried following steps:

 

1.built spark-assembly-1.5.3-hadoop2.6.0.jar with guava 16.0.1 by myself

2.renamed it as 

spark-assembly-1.5.0-cdh5.5.0-hadoop2.6.0-cdh5.5.0.jar under 

/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars

3.restarted CDH cluster(including zookeeper, hdfs, yarn, hive, oozie, hue) with cloudera manager

4.reran spark action job in Hue oozie

 

It still had the same issue:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, com.google.common.reflect.TypeToken.isPrimitive()Z
java.lang.NoSuchMethodError: com.google.common.reflect.TypeToken.isPrimitive()Z
	at com.datastax.driver.core.TypeCodec.<init>(TypeCodec.java:142)
	at com.datastax.driver.core.TypeCodec.<init>(TypeCodec.java:136)

 

Do you know except spark guava version, is there anything else bringing other guava version(not guava 16.0.1) in oozie spark action? How can we fix this issue? Thanks a lot!