Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reading avro file from HDFS using Scala - Exception in thread "main" java.io.IOException: Not a data file.

Solved Go to solution
Highlighted

Reading avro file from HDFS using Scala - Exception in thread "main" java.io.IOException: Not a data file.

Expert Contributor

Hello - i'm reading an Avro file from HDFS - and it seems to be giving exception -

Exception in thread "main" java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
at karan.scala.readuri.ReadAvroFromURI$.readAvroFromURI(ReadAvroFromURI.scala:52)
at karan.scala.readuri.ReadAvroFromURI$.delayedEndpoint$karan$scala$readuri$ReadAvroFromURI$1(ReadAvroFromURI.scala:29)
at karan.scala.readuri.ReadAvroFromURI$delayedInit$body.apply(ReadAvroFromURI.scala:24)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$anonfun$main$1.apply(App.scala:76)
at scala.App$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at karan.scala.readuri.ReadAvroFromURI$.main(ReadAvroFromURI.scala:24)
at karan.scala.readuri.ReadAvroFromURI.main(ReadAvroFromURI.scala)

Here is the Code :

conf.set("fs.defaultFS", "hdfs://localhost:9000")
val inputF = "hdfs://localhost:9000/avro/emp.avsc"
val inPath = new Path(inputF)
val fs = FileSystem.get(URI.create(inputF), conf)
val inStream = new BufferedInputStream(fs.open(inPath))
val reader = new DataFileStream(inStream, new GenericDatumReader())

the DataFileStream.java seems to be looking for magic bytes, to determine if this is Avro, and it is not finding this, and throwing error

void initialize(InputStream in) throws IOException {  this.header = new Header();  this.vin = DecoderFactory.get().binaryDecoder(in, vin);  byte[] magic = new byte[DataFileConstants.MAGIC.length];  try {    vin.readFixed(magic);                         // read magic} catch (IOException e) {    throw new IOException("Not a data file.");}  if (!Arrays.equals(DataFileConstants.MAGIC, magic))    throw new IOException("Not a data file.");

Any ideas on how to fix this ?

The file is fine, and i'm able to do a cat of the file (shown below) :

hdfs dfs -cat hdfs://localhost:9000/avro/emp.avsc

18/03/01 15:30:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

{"namespace": "tutorialspoint.com","type": "record","name": "emp","fields": [{"name": "name", "type": "string"},{"name": "id", "type": "int"},{"name": "salary", "type": "int"},{"name": "age", "type": "int"},{"name": "address", "type": "string"}]}
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Reading avro file from HDFS using Scala - Exception in thread "main" java.io.IOException: Not a data file.

Expert Contributor

issue is fixed, since this is not an Avro file but just an Avro schema . this needs to read as a text file.

1 REPLY 1

Re: Reading avro file from HDFS using Scala - Exception in thread "main" java.io.IOException: Not a data file.

Expert Contributor

issue is fixed, since this is not an Avro file but just an Avro schema . this needs to read as a text file.