- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive on Avro - Reading 0 byte file throws IOException: Not a data file.
- Labels:
-
Apache Hive
Created ‎09-28-2015 01:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When using Hive (v.14) on Avro, org.apache.avro.file.DataFileReader throws java.io.IOException: Not a data file. - when encountering a 0 byte file. This 0 byte file is the result of file rotation during Storm bolt writes to HDFS.
"This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader creates a new org.apache.avro.file.DataFileReader and DataFileReader throws an exception when trying to read an empty file (because the empty file lacks the magic number marking it as avro). It seems like it be straight forward to modify AvroGenericRecordReader to detect an empty file and then behave sensibly. For example, next() would always return false; getPos() would return zero, etc."
Is alterting AvroGenericRecordReader feasible here?
Kris
Created ‎10-09-2015 03:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in https://issues.apache.org/jira/browse/AVRO-1530 and suggested clients ignore zero length files.
The issue has been patched on the Hive side:
https://issues.apache.org/jira/browse/HIVE-11977
-Darwin
Created ‎09-28-2015 03:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From what you described the issue should be dealt within Storm bolt by avoiding writing empty files. Hive in some sense is doing the right thing by throwing error on empty file. From a fix standpoint I would think modifying on the Storm side would be easier as you just need to recompile your topology with the fix rather than having to recompile all of Hive for this.
Created ‎10-09-2015 03:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in https://issues.apache.org/jira/browse/AVRO-1530 and suggested clients ignore zero length files.
The issue has been patched on the Hive side:
https://issues.apache.org/jira/browse/HIVE-11977
-Darwin
Created ‎10-13-2015 11:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I have been working with Aaron on this one.
