Created on 06-14-2016 11:07 AM - edited 09-16-2022 03:25 AM
I keep getting a
java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper
when calling show() on a DataFrame object. I'm attempting to do this through the shell (spark-shell --master yarn). I can see that the shell recognizes the schema when creating the DataFrame object, but if I execute any actions on the data it will always throw the NoClassDefFoundError when trying to instantiate the AvroWrapper. I've tried adding avro-mapred-1.8.0.jar in my $HDFS_USER/lib directory on the cluster and even included it using the --jar option when launching the shell. Neither of these options worked. Any advice would be greatly appreciated. Below is example code:
scala> import org.apache.spark.sql._ scala> import com.databricks.spark.avro._ scala> val sqc = new SQLContext(sc) scala> val df = sqc.read.avro("my_avro_file") // recognizes the schema and creates the DataFrame object scala> df.show // this is where I get NoClassDefFoundError
Created on 06-15-2016 04:55 AM - edited 06-15-2016 05:00 AM
Try starting spark-shell with following packages:
--packages com.databricks:spark-avro_2.10:2.0.1,org.apache.avro:avro-mapred:1.7.7
Created 06-16-2016 12:48 PM
Unfortunately that did not solve the problem. My coworker who is on a Mac doesn't run into this problem and for the life of me I cannot seem to figure out why my Ubuntu box is having this issue. I can run in local mode just fine. It's only when I try to run it on the cluster that I have this issue.
Created 06-16-2016 12:51 PM
not sure if this will help, but here's the output when I launch spark-shell.
Ivy Default Cache set to: /home/deandaj/.ivy2/cache The jars for the packages stored in: /home/deandaj/.ivy2/jars :: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.1-hadoop2.6.0-cdh5.7.1.jar!/org/apache/ivy/core/settings/ivysettings.xml com.databricks#spark-avro_2.10 added as a dependency org.apache.avro#avro-mapred added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found com.databricks#spark-avro_2.10;2.0.1 in local-m2-cache found org.apache.avro#avro;1.7.6 in central found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache found com.thoughtworks.paranamer#paranamer;2.3 in central found org.xerial.snappy#snappy-java;1.0.5 in central found org.apache.commons#commons-compress;1.4.1 in central found org.tukaani#xz;1.0 in central found org.slf4j#slf4j-api;1.6.4 in local-m2-cache found org.apache.avro#avro-mapred;1.7.7 in local-m2-cache found org.apache.avro#avro-ipc;1.7.7 in local-m2-cache found org.apache.avro#avro;1.7.7 in central found org.mortbay.jetty#jetty;6.1.26 in local-m2-cache found org.mortbay.jetty#jetty-util;6.1.26 in local-m2-cache found io.netty#netty;3.4.0.Final in local-m2-cache found org.apache.velocity#velocity;1.7 in local-m2-cache found commons-collections#commons-collections;3.2.1 in local-m2-cache found commons-lang#commons-lang;2.4 in local-m2-cache found org.mortbay.jetty#servlet-api;2.5-20081211 in local-m2-cache :: resolution report :: resolve 6814ms :: artifacts dl 8ms :: modules in use: com.databricks#spark-avro_2.10;2.0.1 from local-m2-cache in [default] com.thoughtworks.paranamer#paranamer;2.3 from central in [default] commons-collections#commons-collections;3.2.1 from local-m2-cache in [default] commons-lang#commons-lang;2.4 from local-m2-cache in [default] io.netty#netty;3.4.0.Final from local-m2-cache in [default] org.apache.avro#avro;1.7.7 from central in [default] org.apache.avro#avro-ipc;1.7.7 from local-m2-cache in [default] org.apache.avro#avro-mapred;1.7.7 from local-m2-cache in [default] org.apache.commons#commons-compress;1.4.1 from central in [default] org.apache.velocity#velocity;1.7 from local-m2-cache in [default] org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default] org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default] org.mortbay.jetty#jetty;6.1.26 from local-m2-cache in [default] org.mortbay.jetty#jetty-util;6.1.26 from local-m2-cache in [default] org.mortbay.jetty#servlet-api;2.5-20081211 from local-m2-cache in [default] org.slf4j#slf4j-api;1.6.4 from local-m2-cache in [default] org.tukaani#xz;1.0 from central in [default] org.xerial.snappy#snappy-java;1.0.5 from central in [default] :: evicted modules: org.apache.avro#avro;1.7.6 by [org.apache.avro#avro;1.7.7] in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 19 | 4 | 4 | 1 || 18 | 0 | --------------------------------------------------------------------- :: problems summary :: :::: ERRORS unknown resolver null unknown resolver null unknown resolver null :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 18 already retrieved (0kB/9ms) Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. 16/06/16 12:50:12 WARN util.Utils: Your hostname, jeff-ubuntu resolves to a loopback address: 127.0.1.1; using 10.104.1.90 instead (on interface eth0) 16/06/16 12:50:12 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 16/06/16 12:50:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context available as sc (master = yarn-client, app id = application_1465929284005_0026).
Created 06-21-2016 10:34 AM
If anyone else runs into this problem, I finally solved it. I removed the CDH spark package and downloaded it from http://spark.apache.org/downloads.html. After that everything works fine. Not sure what the issues was with the CDH version.
Created 06-21-2016 11:35 PM
This looks strange. Your console output listed the below lines
com.databricks#spark-avro_2.10 added as a dependency org.apache.avro#avro-mapred added as a dependency
Can you try once with :
--packages com.databricks:spark-avro_2.10:1.0.0,org.apache.avro:avro-mapred:1.6.3
I can sense some version compatibility issues of avro-mapred with spark-avro.
Created 07-18-2016 12:22 AM
Even I am facing the same issue while trying to process a avro file in pyspark
“java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper”.
Spark version 1.5.0-cdh5.5.2