10-16-2017 03:37 PM - edited 10-16-2017 04:40 PM
The class org.apache.avro.mapreduce.AvroRecordReaderBase contains a reference to org.apache.avro.hadoop.io.AvroSerialization. (Not to be confused with another class by the same name in the mapred package.) The jar file avro-mapred-1.7.6.jar contains a definition of that AvroSerialization class, but avro-mapred-1.7.6-cdh5.12.1.jar does not. The avro-tools-1.7.6-cdh5.12.1.jar does contain a definition of the class I need, but it also includes a lot of other unrelated packages (e.g. amazonaws) that cause conflicts. The same is true of the cdh5.10.1 version.
When I try to use the avro-mapred-1.7.6.jar (without the cdh-5), I run into other errors at run time:
java.lang.Exception: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
I am testing this using a minicluster with the 1.7.6-cdh5.12.1 version of the other hadoop files.
What should I do to resolve this reference? I'd like to avoid building avro-mapred.jar from source myself.
Update: I see now that the AvroSerialization class is in the jar from the cloudera tarball, avro-1.7.6-cdh5.12.1/dist/java/avro-mapred-1.7.6-cdh5.12.1-hadoop2.jar, but it is not in the version I had gotten from https://mvnrepository.com/artifact/org.apache.avro/avro-mapred/1.7.6-cdh5.12.1