<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94393#M7664</link>
    <description>&lt;P&gt;It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in &lt;A href="https://issues.apache.org/jira/browse/AVRO-1530" target="_blank"&gt;https://issues.apache.org/jira/browse/AVRO-1530&lt;/A&gt; and suggested clients ignore zero length files.&lt;/P&gt;&lt;P&gt;The issue has been patched on the Hive side:&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-11977" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-11977&lt;/A&gt;&lt;/P&gt;&lt;P&gt;-Darwin&lt;/P&gt;</description>
    <pubDate>Fri, 09 Oct 2015 22:46:11 GMT</pubDate>
    <dc:creator>dtraver</dc:creator>
    <dc:date>2015-10-09T22:46:11Z</dc:date>
    <item>
      <title>Hive on Avro - Reading 0 byte file throws IOException: Not a data file.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94391#M7662</link>
      <description>&lt;P&gt;When using Hive (v.14) on Avro, org.apache.avro.file.DataFileReader throws java.io.IOException: Not a data file. - when encountering a 0 byte file.  This 0 byte file is the result of file rotation during Storm bolt writes to HDFS. &lt;/P&gt;&lt;P&gt;"This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader creates a new org.apache.avro.file.DataFileReader and DataFileReader throws an exception when trying to read an empty file (because the empty file lacks the magic number marking it as avro).  It seems like it be straight forward to modify AvroGenericRecordReader to detect an empty file and then behave sensibly.  For example, next() would always return false; getPos() would return zero, etc."&lt;/P&gt;&lt;P&gt;Is alterting AvroGenericRecordReader feasible here? &lt;/P&gt;&lt;P&gt;Kris&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2015 20:07:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94391#M7662</guid>
      <dc:creator>kkane</dc:creator>
      <dc:date>2015-09-28T20:07:07Z</dc:date>
    </item>
    <item>
      <title>Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94392#M7663</link>
      <description>&lt;P&gt;From what you described the issue should be dealt within Storm bolt by avoiding writing empty files. Hive in some sense is doing the right thing by throwing error on empty file. From a fix standpoint I would think modifying on the Storm side would be easier as you just need to recompile your topology with the fix rather than having to recompile all of Hive for this.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2015 22:21:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94392#M7663</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2015-09-28T22:21:02Z</dc:date>
    </item>
    <item>
      <title>Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94393#M7664</link>
      <description>&lt;P&gt;It was suggested to skip such files in the Avro's native reader itself. But the Avro project declined that option in &lt;A href="https://issues.apache.org/jira/browse/AVRO-1530" target="_blank"&gt;https://issues.apache.org/jira/browse/AVRO-1530&lt;/A&gt; and suggested clients ignore zero length files.&lt;/P&gt;&lt;P&gt;The issue has been patched on the Hive side:&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-11977" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-11977&lt;/A&gt;&lt;/P&gt;&lt;P&gt;-Darwin&lt;/P&gt;</description>
      <pubDate>Fri, 09 Oct 2015 22:46:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94393#M7664</guid>
      <dc:creator>dtraver</dc:creator>
      <dc:date>2015-10-09T22:46:11Z</dc:date>
    </item>
    <item>
      <title>Re: Hive on Avro - Reading 0 byte file throws IOException: Not a data file.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94394#M7665</link>
      <description>&lt;P&gt;Yes, I have been working with Aaron on this one. &lt;/P&gt;</description>
      <pubDate>Wed, 14 Oct 2015 06:25:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Avro-Reading-0-byte-file-throws-IOException-Not-a/m-p/94394#M7665</guid>
      <dc:creator>kkane</dc:creator>
      <dc:date>2015-10-14T06:25:31Z</dc:date>
    </item>
  </channel>
</rss>

