Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

New Contributor

I keep getting a

java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper

when calling show() on a DataFrame object. I'm attempting to do this through the shell (spark-shell --master yarn). I can see that the shell recognizes the schema when creating the DataFrame object, but if I execute any actions on the data it will always throw the NoClassDefFoundError when trying to instantiate the AvroWrapper. I've tried adding avro-mapred-1.8.0.jar in my $HDFS_USER/lib directory on the cluster and even included it using the --jar option when launching the shell. Neither of these options worked. Any advice would be greatly appreciated. Below is example code:

 

scala> import org.apache.spark.sql._
scala> import com.databricks.spark.avro._
scala> val sqc = new SQLContext(sc)
scala> val df = sqc.read.avro("my_avro_file") // recognizes the schema and creates the DataFrame object
scala> df.show // this is where I get NoClassDefFoundError

 

6 REPLIES 6

Re: NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

Contributor

Try starting spark-shell with following packages:

 

--packages com.databricks:spark-avro_2.10:2.0.1,org.apache.avro:avro-mapred:1.7.7

 

Re: NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

New Contributor

Unfortunately that did not solve the problem. My coworker who is on a Mac doesn't run into this problem and for the life of me I cannot seem to figure out why my Ubuntu box is having this issue. I can run in local mode just fine. It's only when I try to run it on the cluster that I have this issue.

Re: NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

New Contributor

not sure if this will help, but here's the output when I launch spark-shell.

 

Ivy Default Cache set to: /home/deandaj/.ivy2/cache
The jars for the packages stored in: /home/deandaj/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.1-hadoop2.6.0-cdh5.7.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.10 added as a dependency
org.apache.avro#avro-mapred added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
	confs: [default]
	found com.databricks#spark-avro_2.10;2.0.1 in local-m2-cache
	found org.apache.avro#avro;1.7.6 in central
	found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
	found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
	found com.thoughtworks.paranamer#paranamer;2.3 in central
	found org.xerial.snappy#snappy-java;1.0.5 in central
	found org.apache.commons#commons-compress;1.4.1 in central
	found org.tukaani#xz;1.0 in central
	found org.slf4j#slf4j-api;1.6.4 in local-m2-cache
	found org.apache.avro#avro-mapred;1.7.7 in local-m2-cache
	found org.apache.avro#avro-ipc;1.7.7 in local-m2-cache
	found org.apache.avro#avro;1.7.7 in central
	found org.mortbay.jetty#jetty;6.1.26 in local-m2-cache
	found org.mortbay.jetty#jetty-util;6.1.26 in local-m2-cache
	found io.netty#netty;3.4.0.Final in local-m2-cache
	found org.apache.velocity#velocity;1.7 in local-m2-cache
	found commons-collections#commons-collections;3.2.1 in local-m2-cache
	found commons-lang#commons-lang;2.4 in local-m2-cache
	found org.mortbay.jetty#servlet-api;2.5-20081211 in local-m2-cache
:: resolution report :: resolve 6814ms :: artifacts dl 8ms
	:: modules in use:
	com.databricks#spark-avro_2.10;2.0.1 from local-m2-cache in [default]
	com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
	commons-collections#commons-collections;3.2.1 from local-m2-cache in [default]
	commons-lang#commons-lang;2.4 from local-m2-cache in [default]
	io.netty#netty;3.4.0.Final from local-m2-cache in [default]
	org.apache.avro#avro;1.7.7 from central in [default]
	org.apache.avro#avro-ipc;1.7.7 from local-m2-cache in [default]
	org.apache.avro#avro-mapred;1.7.7 from local-m2-cache in [default]
	org.apache.commons#commons-compress;1.4.1 from central in [default]
	org.apache.velocity#velocity;1.7 from local-m2-cache in [default]
	org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
	org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
	org.mortbay.jetty#jetty;6.1.26 from local-m2-cache in [default]
	org.mortbay.jetty#jetty-util;6.1.26 from local-m2-cache in [default]
	org.mortbay.jetty#servlet-api;2.5-20081211 from local-m2-cache in [default]
	org.slf4j#slf4j-api;1.6.4 from local-m2-cache in [default]
	org.tukaani#xz;1.0 from central in [default]
	org.xerial.snappy#snappy-java;1.0.5 from central in [default]
	:: evicted modules:
	org.apache.avro#avro;1.7.6 by [org.apache.avro#avro;1.7.7] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   19  |   4   |   4   |   1   ||   18  |   0   |
	---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
	unknown resolver null

	unknown resolver null

	unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent
	confs: [default]
	0 artifacts copied, 18 already retrieved (0kB/9ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
16/06/16 12:50:12 WARN util.Utils: Your hostname, jeff-ubuntu resolves to a loopback address: 127.0.1.1; using 10.104.1.90 instead (on interface eth0)
16/06/16 12:50:12 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/06/16 12:50:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as sc (master = yarn-client, app id = application_1465929284005_0026).

Re: NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

New Contributor

If anyone else runs into this problem, I finally solved it. I removed the CDH spark package and downloaded it from http://spark.apache.org/downloads.html. After that everything works fine. Not sure what the issues was with the CDH version.

Re: NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

Contributor

This looks strange. Your console output listed the below lines

 

 

com.databricks#spark-avro_2.10 added as a dependency
org.apache.avro#avro-mapred added as a dependency

 

Can you try once with :

 

 

--packages com.databricks:spark-avro_2.10:1.0.0,org.apache.avro:avro-mapred:1.6.3

 

I can sense some version compatibility issues of avro-mapred with spark-avro.

Re: NoClassDefFoundError when using avro in spark-shell (CDH 5.6)

New Contributor

Even I am facing the same issue while trying to process a avro file in pyspark

 

“java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper”.

 

Spark version 1.5.0-cdh5.5.2