Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDH 5.3 CLASSPATH Issue?

avatar
Contributor

After going from 5.2.1 to 5.3, YARN jobs are now failing unable to find classes within assembly jars (example):

 

java.lang.NoSuchMethodError: org.joda.time.DateTime.now(Lorg/joda/time/DateTimeZone;)Lorg/joda/time/DateTime;

 

 

This is how we had been calling the applications:

 

HADOOP_CLASSPATH=/path/to/jar/application.jar:$(hbase classpath) /usr/bin/hadoop jar /path/to/jar/application.jar

 

Re-compling to ensure everything uses the 5.3 Maven repositories, etc, etc - same issue.

 

1 ACCEPTED SOLUTION

avatar
Explorer

The workaround for this is to break the link in your CDH bundle on the gateway machines where you submit the mapreduce jobs at this location -->$CDH_ROOT/lib/hadoop-mapreduce

 

The link was put there in error in CDH 5.3 and a patch has been submitted to get it removed in the next release.  Searching all directories that show up when running the command 'hadoop classpath' for the joda-time-1.6.jar will help you find the errant link if you use non-default locations for the install.  The fix should apply to spark and MR jobs.

View solution in original post

8 REPLIES 8

avatar

I'm having the same problem.

 

It appears that in CDH 5.3 they're shipping joda-time 1.6 in the hadoop/lib directory so its being added to the YARN classpath.

I think this is in error.

 

Could someone from Cloudera confirm?

 

The quickest (dirtiest) fix would be to remove joda-time.jar / joda-time-1.6.jar from /opt/cloudera/parcels/CDH/lib/hadoop/client and hadoop-mapreduce

 

-Adam

avatar
New Contributor

The solution we found to this problem is to override the Hadoop classpath, using environment variables:

 

export HADOOP_USER_CLASSPATH_FIRST=true

export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/joda-time-2.2.jar

 

You could remove the joda-time library from the CDH installation and/or add a new version, or you could add the above lines to the script called "hadoop" in the CDH installation, but both those solutions mean hacking your installation for all your users, which I'm not really comfortable with. Once you start changing your installation, you can't really compare anymore what happens on your cluster to what other people see who are running CDH 5.3. The downside is of course obvious as well, which is that if you do not modify the cluster installation, every user will have to deal with the problem individually.

 

Cheers-

avatar
New Contributor

Hello,

 

I have the same issue with joda-time methods not found since CDH 5.3.0.

 

However the fix suggested with 'export HADOOP...' did not work for me, probably because I'm using Spark and I don't understand well how the dependencies are impacted by which config option...

 

Anyway, in my case, I'm using sbt assembly to deliver a jar with a more recent joda-time (I also checked in the jar that's it's the right one actually bundled), yet it's still failing, the Spark jobs are still using the old joda-time.
Someone knows how to deal with this ?

 

 

Meantime I simply fallbacked my cluster to 5.2.1 parcels.

 

Yet I didn't find any other tickets about this.

Can someone from Cloudera confirm if this is actually a bug in CDH 5.3 that will be solved in next release, or if we must find a way to fix it in our environment ourselves with a long term solution ?

 

Thanks and regards

 

avatar
Contributor

I had tried the environmental variable approach as well when I originally posted this - didn't work for me either.

 

Having to go in and delete the file from the cluster definitely does not sound like an attractive approach.

avatar
New Contributor
Hello,
Yes, this is the problem: You compile and package your sbt assembly with a good, new version of joda-time and when you run, you pick up the wrong joda-time (version 1.6 from 2011). The reason this happens is that you're not running your program as if it were a normal Scala/Java program:
$ java -cp assembly.jar my.class.name arg1 arg2
Using this command, you directly control the classpath with the -cp switch. I'm sure that if you run your program this way, it will work, but of course then you can only run on a single node.
To run using Hadoop MapReduce or Spark, you execute a command that will run your program for you. For Hadoop MR:
$ hadoop jar assembly.jar my.class.name arg1 arg2
The "hadoop" command is actually a script which will start a Java/Scala program for you, with a classpath that contains your assembly JAR, plus a bunch of other stuff to be able to allow the program to deal with the Hadoop infrastructure. And this is where the joda-time JAR comes in - the hadoop script adds it on the classpath before you assembly JAR and therefore it takes precedence over what you have in your JAR.
So the trick is, in my case, to manipulate the hadoop script into constructing a classpath that has joda-time 2.2 (or whatever you need - 2.7 is the current latest), before any potential occurrence of joda-time 1.6 on the classpath.
In your case, you're using Spark, so I guess you use spark-submit. What you have to do is look at the options of that command and find the way to add something to the classpath before any other stuff it may add. I know you can pass a configuration file, perhaps you can add a property in there to add joda-time 2.2 to the classpath? Another option, if you're allowed, is to edit the spark-submit script directly or to remove joda-time 1.6 from the cluster installation. I would only resort to that when Cloudera advises to do so, though. Perhaps components in the cluster need this joda-time 1.6 for something, so you might break more than you fix.
Good luck,
Paul

avatar
Guru

I just wanted to let everyone on this thread know that we have informed engineering about this issue and they are investigating.  I will report back on this thread when I have an update.

 

Thank you for reporting this issue, we truly value your feedback!

 

Clint

avatar
Explorer

The workaround for this is to break the link in your CDH bundle on the gateway machines where you submit the mapreduce jobs at this location -->$CDH_ROOT/lib/hadoop-mapreduce

 

The link was put there in error in CDH 5.3 and a patch has been submitted to get it removed in the next release.  Searching all directories that show up when running the command 'hadoop classpath' for the joda-time-1.6.jar will help you find the errant link if you use non-default locations for the install.  The fix should apply to spark and MR jobs.

avatar
New Contributor

Thanks for the workaround, it did work for the past week in the meantime.

 

 

I just upgraded to CDH 5.3.1 and indeed it successfuly fixed the issue for me, so we do not need for this workaround anymore.

 

Thanks again for the quick fix.

Regards,