Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive on Avro - Reading 0 byte file throws IOException: Not a data file.

avatar
Rising Star

When using Hive (v.14) on Avro, org.apache.avro.file.DataFileReader throws java.io.IOException: Not a data file. - when encountering a 0 byte file. This 0 byte file is the result of file rotation during Storm bolt writes to HDFS.

"This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader creates a new org.apache.avro.file.DataFileReader and DataFileReader throws an exception when trying to read an empty file (because the empty file lacks the magic number marking it as avro). It seems like it be straight forward to modify AvroGenericRecordReader to detect an empty file and then behave sensibly. For example, next() would always return false; getPos() would return zero, etc."

Is alterting AvroGenericRecordReader feasible here?

Kris

1 ACCEPTED SOLUTION

avatar
Rising Star

It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in https://issues.apache.org/jira/browse/AVRO-1530 and suggested clients ignore zero length files.

The issue has been patched on the Hive side:

https://issues.apache.org/jira/browse/HIVE-11977

-Darwin

View solution in original post

3 REPLIES 3

avatar

From what you described the issue should be dealt within Storm bolt by avoiding writing empty files. Hive in some sense is doing the right thing by throwing error on empty file. From a fix standpoint I would think modifying on the Storm side would be easier as you just need to recompile your topology with the fix rather than having to recompile all of Hive for this.

avatar
Rising Star

It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in https://issues.apache.org/jira/browse/AVRO-1530 and suggested clients ignore zero length files.

The issue has been patched on the Hive side:

https://issues.apache.org/jira/browse/HIVE-11977

-Darwin

avatar
Rising Star

Yes, I have been working with Aaron on this one.