Member since
01-15-2015
2
Posts
0
Kudos Received
0
Solutions
01-16-2015
03:57 AM
Hello, Yes, this is the problem: You compile and package your sbt assembly with a good, new version of joda-time and when you run, you pick up the wrong joda-time (version 1.6 from 2011). The reason this happens is that you're not running your program as if it were a normal Scala/Java program: $ java -cp assembly.jar my.class.name arg1 arg2 Using this command, you directly control the classpath with the -cp switch. I'm sure that if you run your program this way, it will work, but of course then you can only run on a single node. To run using Hadoop MapReduce or Spark, you execute a command that will run your program for you. For Hadoop MR: $ hadoop jar assembly.jar my.class.name arg1 arg2 The "hadoop" command is actually a script which will start a Java/Scala program for you, with a classpath that contains your assembly JAR, plus a bunch of other stuff to be able to allow the program to deal with the Hadoop infrastructure. And this is where the joda-time JAR comes in - the hadoop script adds it on the classpath before you assembly JAR and therefore it takes precedence over what you have in your JAR. So the trick is, in my case, to manipulate the hadoop script into constructing a classpath that has joda-time 2.2 (or whatever you need - 2.7 is the current latest), before any potential occurrence of joda-time 1.6 on the classpath. In your case, you're using Spark, so I guess you use spark-submit. What you have to do is look at the options of that command and find the way to add something to the classpath before any other stuff it may add. I know you can pass a configuration file, perhaps you can add a property in there to add joda-time 2.2 to the classpath? Another option, if you're allowed, is to edit the spark-submit script directly or to remove joda-time 1.6 from the cluster installation. I would only resort to that when Cloudera advises to do so, though. Perhaps components in the cluster need this joda-time 1.6 for something, so you might break more than you fix. Good luck, Paul
... View more
01-15-2015
12:15 AM
The solution we found to this problem is to override the Hadoop classpath, using environment variables: export HADOOP_USER_CLASSPATH_FIRST=true export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/joda-time-2.2.jar You could remove the joda-time library from the CDH installation and/or add a new version, or you could add the above lines to the script called "hadoop" in the CDH installation, but both those solutions mean hacking your installation for all your users, which I'm not really comfortable with. Once you start changing your installation, you can't really compare anymore what happens on your cluster to what other people see who are running CDH 5.3. The downside is of course obvious as well, which is that if you do not modify the cluster installation, every user will have to deal with the problem individually. Cheers-
... View more