Support Questions
Find answers, ask questions, and share your expertise

How can I set an ORC InStream buffer size in Hive?

Highlighted

How can I set an ORC InStream buffer size in Hive?

Contributor

Running this statement

INSERT INTO TABLE FOO PARTITION(partition_date) SELECT DISTINCT [columns from BAR] FROM BAR left outer join FOO ON (BAR.application.id = FOO.unique_id) where FOO.unique_id is null

fails with the stack trace below. The only setting I could find that seemed relevant was hive.exec.orc.default.buffer.size, but I confirmed that is already set to the default value of 262,144. FOO has about 3.8B rows and is an ORC table, BAR is an external avro table. I'm running on HDP 2.3.4 with Hive 1.2.1

Anyone have suggestions for addressing this?

Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 32768 needed = 146215
at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193)
at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringDirectTreeReader.next(TreeReaderFactory.java:1554)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringTreeReader.next(TreeReaderFactory.java:1397)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:249)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:186)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.<init>(OrcRawRecordMerger.java:226)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:437)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
5 REPLIES 5
Highlighted

Re: How can I set an ORC InStream buffer size in Hive?

@Aaron Dossett

try to increase below property in core_site.xml

<property>

<name>io.file.buffer.size</name>

<value>146215</value>

</property>

Highlighted

Re: How can I set an ORC InStream buffer size in Hive?

Contributor

Thanks @Divakar Annapureddy! I checked and that value is currently 131072 in core-site. I tried overriding it in Hive with "set io.file.buffer.size=146215" and got the same error message. In other words, it still has a buffer size of 32K and not the value in core-site or what I set through hive.

Highlighted

Re: How can I set an ORC InStream buffer size in Hive?

@Aaron Dossett

Can you please try this as well link

Highlighted

Re: How can I set an ORC InStream buffer size in Hive?

Contributor

Interesting, it looks like I'm seeing a similar error in a different context (my version of hive doesn't have any of the LLAP functionality, as i understand it).

Highlighted

Re: How can I set an ORC InStream buffer size in Hive?

New Contributor

Thought that it was HIVE-12450 OrcFileMergeOperator does not use correct compression buffer size.

But perhaps there is still a problem here.