Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

avatar
Explorer

I keep getting a

java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper

when calling show() on a DataFrame object. I'm attempting to do this through the shell (spark-shell --master yarn). I can see that the shell recognizes the schema when creating the DataFrame object, but if I execute any actions on the data it will always throw the NoClassDefFoundError when trying to instantiate the AvroWrapper. I've tried adding avro-mapred-1.8.0.jar in my $HDFS_USER/lib directory on the cluster and even included it using the --jar option when launching the shell. Neither of these options worked. Any advice would be greatly appreciated. Below is example code:

 

scala> import org.apache.spark.sql._
scala> import com.databricks.spark.avro._
scala> val sqc = new SQLContext(sc)
scala> val df = sqc.read.avro("my_avro_file") // recognizes the schema and creates the DataFrame object
scala> df.show // this is where I get NoClassDefFoundError

 

6 REPLIES 6

avatar
Rising Star

Try starting spark-shell with following packages:

 

--packages com.databricks:spark-avro_2.10:2.0.1,org.apache.avro:avro-mapred:1.7.7

 

avatar
Explorer

Unfortunately that did not solve the problem. My coworker who is on a Mac doesn't run into this problem and for the life of me I cannot seem to figure out why my Ubuntu box is having this issue. I can run in local mode just fine. It's only when I try to run it on the cluster that I have this issue.

avatar
Explorer

not sure if this will help, but here's the output when I launch spark-shell.

 

Ivy Default Cache set to: /home/deandaj/.ivy2/cache
The jars for the packages stored in: /home/deandaj/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.1-hadoop2.6.0-cdh5.7.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.10 added as a dependency
org.apache.avro#avro-mapred added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
	confs: [default]
	found com.databricks#spark-avro_2.10;2.0.1 in local-m2-cache
	found org.apache.avro#avro;1.7.6 in central
	found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
	found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
	found com.thoughtworks.paranamer#paranamer;2.3 in central
	found org.xerial.snappy#snappy-java;1.0.5 in central
	found org.apache.commons#commons-compress;1.4.1 in central
	found org.tukaani#xz;1.0 in central
	found org.slf4j#slf4j-api;1.6.4 in local-m2-cache
	found org.apache.avro#avro-mapred;1.7.7 in local-m2-cache
	found org.apache.avro#avro-ipc;1.7.7 in local-m2-cache
	found org.apache.avro#avro;1.7.7 in central
	found org.mortbay.jetty#jetty;6.1.26 in local-m2-cache
	found org.mortbay.jetty#jetty-util;6.1.26 in local-m2-cache
	found io.netty#netty;3.4.0.Final in local-m2-cache
	found org.apache.velocity#velocity;1.7 in local-m2-cache
	found commons-collections#commons-collections;3.2.1 in local-m2-cache
	found commons-lang#commons-lang;2.4 in local-m2-cache
	found org.mortbay.jetty#servlet-api;2.5-20081211 in local-m2-cache
:: resolution report :: resolve 6814ms :: artifacts dl 8ms
	:: modules in use:
	com.databricks#spark-avro_2.10;2.0.1 from local-m2-cache in [default]
	com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
	commons-collections#commons-collections;3.2.1 from local-m2-cache in [default]
	commons-lang#commons-lang;2.4 from local-m2-cache in [default]
	io.netty#netty;3.4.0.Final from local-m2-cache in [default]
	org.apache.avro#avro;1.7.7 from central in [default]
	org.apache.avro#avro-ipc;1.7.7 from local-m2-cache in [default]
	org.apache.avro#avro-mapred;1.7.7 from local-m2-cache in [default]
	org.apache.commons#commons-compress;1.4.1 from central in [default]
	org.apache.velocity#velocity;1.7 from local-m2-cache in [default]
	org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
	org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
	org.mortbay.jetty#jetty;6.1.26 from local-m2-cache in [default]
	org.mortbay.jetty#jetty-util;6.1.26 from local-m2-cache in [default]
	org.mortbay.jetty#servlet-api;2.5-20081211 from local-m2-cache in [default]
	org.slf4j#slf4j-api;1.6.4 from local-m2-cache in [default]
	org.tukaani#xz;1.0 from central in [default]
	org.xerial.snappy#snappy-java;1.0.5 from central in [default]
	:: evicted modules:
	org.apache.avro#avro;1.7.6 by [org.apache.avro#avro;1.7.7] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   19  |   4   |   4   |   1   ||   18  |   0   |
	---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
	unknown resolver null

	unknown resolver null

	unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent
	confs: [default]
	0 artifacts copied, 18 already retrieved (0kB/9ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
16/06/16 12:50:12 WARN util.Utils: Your hostname, jeff-ubuntu resolves to a loopback address: 127.0.1.1; using 10.104.1.90 instead (on interface eth0)
16/06/16 12:50:12 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/06/16 12:50:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as sc (master = yarn-client, app id = application_1465929284005_0026).

avatar
Explorer

If anyone else runs into this problem, I finally solved it. I removed the CDH spark package and downloaded it from http://spark.apache.org/downloads.html. After that everything works fine. Not sure what the issues was with the CDH version.

avatar
Rising Star

This looks strange. Your console output listed the below lines

 

 

com.databricks#spark-avro_2.10 added as a dependency
org.apache.avro#avro-mapred added as a dependency

 

Can you try once with :

 

 

--packages com.databricks:spark-avro_2.10:1.0.0,org.apache.avro:avro-mapred:1.6.3

 

I can sense some version compatibility issues of avro-mapred with spark-avro.

avatar
New Contributor

Even I am facing the same issue while trying to process a avro file in pyspark

 

“java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper”.

 

Spark version 1.5.0-cdh5.5.2