NoClassDefFoundError when using avro in spark-shell (CDH 5.6)


I keep getting a

java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper

when calling show() on a DataFrame object. I'm attempting to do this through the shell (spark-shell --master yarn). I can see that the shell recognizes the schema when creating the DataFrame object, but if I execute any actions on the data it will always throw the NoClassDefFoundError when trying to instantiate the AvroWrapper. I've tried adding avro-mapred-1.8.0.jar in my $HDFS_USER/lib directory on the cluster and even included it using the --jar option when launching the shell. Neither of these options worked. Any advice would be greatly appreciated. Below is example code:


scala> import org.apache.spark.sql._
scala> import com.databricks.spark.avro._
scala> val sqc = new SQLContext(sc)
scala> val df ="my_avro_file") // recognizes the schema and creates the DataFrame object
scala> // this is where I get NoClassDefFoundError



Try starting spark-shell with following packages:


--packages com.databricks:spark-avro_2.10:2.0.1,org.apache.avro:avro-mapred:1.7.7



Unfortunately that did not solve the problem. My coworker who is on a Mac doesn't run into this problem and for the life of me I cannot seem to figure out why my Ubuntu box is having this issue. I can run in local mode just fine. It's only when I try to run it on the cluster that I have this issue.


not sure if this will help, but here's the output when I launch spark-shell.


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
16/06/16 12:50:12 WARN util.Utils: Your hostname, jeff-ubuntu resolves to a loopback address:; using instead (on interface eth0)
16/06/16 12:50:12 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/06/16 12:50:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as sc (master = yarn-client, app id = application_1465929284005_0026).


If anyone else runs into this problem, I finally solved it. I removed the CDH spark package and downloaded it from After that everything works fine. Not sure what the issues was with the CDH version.

This looks strange. Your console output listed the below lines



com.databricks#spark-avro_2.10 added as a dependency
org.apache.avro#avro-mapred added as a dependency


Can you try once with :



--packages com.databricks:spark-avro_2.10:1.0.0,org.apache.avro:avro-mapred:1.6.3


I can sense some version compatibility issues of avro-mapred with spark-avro.

Even I am facing the same issue while trying to process a avro file in pyspark


“java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper”.


Spark version 1.5.0-cdh5.5.2