Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can I set an ORC InStream buffer size in Hive?

avatar
Rising Star

Running this statement

INSERT INTO TABLE FOO PARTITION(partition_date) SELECT DISTINCT [columns from BAR] FROM BAR left outer join FOO ON (BAR.application.id = FOO.unique_id) where FOO.unique_id is null

fails with the stack trace below. The only setting I could find that seemed relevant was hive.exec.orc.default.buffer.size, but I confirmed that is already set to the default value of 262,144. FOO has about 3.8B rows and is an ORC table, BAR is an external avro table. I'm running on HDP 2.3.4 with Hive 1.2.1

Anyone have suggestions for addressing this?

Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 32768 needed = 146215
at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193)
at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringDirectTreeReader.next(TreeReaderFactory.java:1554)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringTreeReader.next(TreeReaderFactory.java:1397)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:249)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:186)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.<init>(OrcRawRecordMerger.java:226)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:437)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
5 REPLIES 5

avatar

@Aaron Dossett

try to increase below property in core_site.xml

<property>

<name>io.file.buffer.size</name>

<value>146215</value>

</property>

avatar
Rising Star

Thanks @Divakar Annapureddy! I checked and that value is currently 131072 in core-site. I tried overriding it in Hive with "set io.file.buffer.size=146215" and got the same error message. In other words, it still has a buffer size of 32K and not the value in core-site or what I set through hive.

avatar

@Aaron Dossett

Can you please try this as well link

avatar
Rising Star

Interesting, it looks like I'm seeing a similar error in a different context (my version of hive doesn't have any of the LLAP functionality, as i understand it).

avatar
New Contributor

Thought that it was HIVE-12450 OrcFileMergeOperator does not use correct compression buffer size.

But perhaps there is still a problem here.