Support Questions
Find answers, ask questions, and share your expertise

Hive on Avro - Reading 0 byte file throws IOException: Not a data file.

Contributor

When using Hive (v.14) on Avro, org.apache.avro.file.DataFileReader throws java.io.IOException: Not a data file. - when encountering a 0 byte file. This 0 byte file is the result of file rotation during Storm bolt writes to HDFS.

"This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader creates a new org.apache.avro.file.DataFileReader and DataFileReader throws an exception when trying to read an empty file (because the empty file lacks the magic number marking it as avro). It seems like it be straight forward to modify AvroGenericRecordReader to detect an empty file and then behave sensibly. For example, next() would always return false; getPos() would return zero, etc."

Is alterting AvroGenericRecordReader feasible here?

Kris

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.

Explorer

It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in https://issues.apache.org/jira/browse/AVRO-1530 and suggested clients ignore zero length files.

The issue has been patched on the Hive side:

https://issues.apache.org/jira/browse/HIVE-11977

-Darwin

View solution in original post

3 REPLIES 3

Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.

Master Collaborator

From what you described the issue should be dealt within Storm bolt by avoiding writing empty files. Hive in some sense is doing the right thing by throwing error on empty file. From a fix standpoint I would think modifying on the Storm side would be easier as you just need to recompile your topology with the fix rather than having to recompile all of Hive for this.

Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.

Explorer

It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in https://issues.apache.org/jira/browse/AVRO-1530 and suggested clients ignore zero length files.

The issue has been patched on the Hive side:

https://issues.apache.org/jira/browse/HIVE-11977

-Darwin

View solution in original post

Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.

Contributor

Yes, I have been working with Aaron on this one.