Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark 1.3.0 missing classes

Highlighted

Spark 1.3.0 missing classes

Contributor

 

I installed CDH 5.4 with cloudera manager from http://archive.cloudera.com/cdh5/parcels/5.4/ 

 

Now when i start the pyspark or spark-shell i get the Exception: "Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/Module" 

 

This might be caused by librarries missing from the spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar

There is no com.fasterxml sub directory in the lib directory of the jar.

 

In the 1.3.1 assembly downloaded from the apache spark website there is a com.fasterxml lib directory and starting spark-shell and pyspark works perfect.

 

How do i replace the CDH 5.4 parcel spark version with the apache spark version (for hadoop 2.6)  is it enough if i just upload the new spark-assembly to the spark hdfs directory?

 

and is there someting wrong with the CDH 5.4 spark jar? 

 

thx for any assistance

11 REPLIES 11

Re: Spark 1.3.0 missing classes

Master Collaborator

No, CDH 5.4 works correctly. My shells start fine. You have another problem, maybe some modification you have made, env variables, or something specific to your app.

Re: Spark 1.3.0 missing classes

Contributor

ok, finally got spark working by upgrading to spark 1.3.1 downloaded from the spark project website.

 

i replaced all the old spark jars and executables in parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/spark and now everything works ok.

 

i didn't change the enviroment vars or other config files after installen cdh 5.4. so that could not be the cause of spark not working.

 

 

 

 

Re: Spark 1.3.0 missing classes

Master Collaborator

I don't think this is a great idea, and would not advise anyone to do this. You're incompletely modifying the default deployment in a way that does not necessarily work with the rest of the ecosystem. Since CDH 5.4 works as-is (try a new VM if you don't believe me), it must be something with your environment.

Re: Spark 1.3.0 missing classes

Contributor

 

i aggree this is not the ideal fix but it seems to work.

 

To find out what is wrong with my environment config i could search for another week. Something must have gone slightly wrong when i did the upgrade from 5.3 to 5.4.

 

When a new CDH version comes out i will change everything back to the officially supported version.

 

 

Re: Spark 1.3.0 missing classes

New Contributor

I recently upgraded Spark from 1.2 to 1.3 through the package upgrade route. I'm also facing the same issue :

 

2015-05-07 04:26:51,625 ERROR akka.actor.ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-3] shutting down ActorSystem [sparkMaster]
java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/Module
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:185)
        at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:181)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:181)
        at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:98)
        at org.apache.spark.deploy.master.Master.preStart(Master.scala:146)
        at akka.actor.ActorCell.create(ActorCell.scala:562)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
        at akka.dispatch.Mailbox.run(Mailbox.scala:218)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.Module
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 22 more

 

I don't find the fasterxml libraries in the spark-assembly jars

Re: Spark 1.3.0 missing classes

New Contributor

Got it working by building the complete spark 1.3.0_5.4.0 from Cloudera's github repo. 

Re: Spark 1.3.0 missing classes

Master Collaborator

I think this is all patching over some deeper problem in your configuration then. You have old versions of something on some classpath. Nothing requires you to rebuild any code.

Re: Spark 1.3.0 missing classes

New Contributor

I had the same issue, running spark-submit on cdh5.4.0, and a search led to this forum post.

 

Running a Spark build normally produces a spark-assembly-<version>.jar that includes all of the jackson classes (com.fasterxml.jackson.*) and also includes the class that is missing here.

 

Looking at the Cloudera-built spark-assembly jar, it doesn't contain any jackson classes at all.

 

Instead, Cloudera separately adds a bunch of jackson jars from their jars folder to the classpath.   Note that jackson-module-scala is missing from this list:

 

[ah_tmp_guest@apollo-mini-jp-cdhspark-001 bin]$ ./compute-classpath.sh | tr ':' '\n' | grep jackson

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-annotations-2.2.3.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-core-2.2.3.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-core-asl-1.8.8.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-databind-2.2.3.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-jaxrs-1.8.8.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-mapper-asl-1.8.8.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-xc-1.8.8.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/parquet/lib/parquet-jackson-1.5.0-cdh5.4.0.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-jaxrs-1.9.2.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-xc-1.9.2.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-annotations-2.3.0.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-core-2.3.1.jar

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/jackson-databind-2.3.1.jar

 

 

So it really appears that this is a Cloudera bug.   What am I missing?

 

Also, here's a Jira for this issue that someone filed back in May.   No love at all:

 

https://issues.cloudera.org/browse/DISTRO-725

 

 

Re: Spark 1.3.0 missing classes

Master Collaborator

Generally, Spark distributions are built using its "hadoop-provided" profile, which is not I think how you are building Spark. That is, Hadoop classes and a bunch of its dependencies are provided at runtime by the cluster. This is why you find a lot less bundled in the CDH assembly (or if you build with hadoop-provided yourself). That much is not a bug, no.

 

I am not clear on the cause in the OP, since CDH does not show this error if you simply run pyspark. There must be more to it that's missing from the description, if it's not indeed just down to a local deployment problem.