Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Tez IndexOutOfBoundsException in select distinct

Solved Go to solution

Tez IndexOutOfBoundsException in select distinct

Rising Star

Hi all,

I get an java.lang.IndexOutOfBoundsException while trying to execute a select distinct(...) on a big hive table (about 60 GB).

This is the log of the Tez vertex:

2016-02-23 16:35:03,039 [ERROR] [TezChild] |tez.TezProcessor|: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:326)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
    at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
    at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
    at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
    at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
    at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:141)
    at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
    ... 16 more
Caused by: java.lang.IndexOutOfBoundsException
    at java.nio.Buffer.checkBounds(Buffer.java:567)
    at java.nio.ByteBuffer.get(ByteBuffer.java:686)
    at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:285)
    at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:609)
    at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:569)
    at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:737)
    at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:793)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:853)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
    at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
    at org.apache.hadoop.hive.ql.exec.Utilities.skipHeader(Utilities.java:3911)
    at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:337)
    ... 22 more

I already tried to disable vectorization and to increment the tez container size, but nothing changed.

If I execute the query on the same table, but with less data inside, all goes right.

Do you already seen this kind of error?

Thank you,

D.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Tez IndexOutOfBoundsException in select distinct

Rising Star

Ignoring the actual backtrace (which is a bug), I have seen issues with uncompressed text tables in Tez related to Hive's use of Hadoop-1 APIs.

Try re-running with

set mapreduce.input.fileinputformat.split.minsize=67108864;
or alternatively, compress the files before loading with gzip with something like this https://gist.github.com/t3rmin4t0r/49e391eab4fbdfdc8ce1
4 REPLIES 4

Re: Tez IndexOutOfBoundsException in select distinct

Rising Star

This looks awfully like an HDFS bug and entirely unrelated to Tez. The IndexOutOfBounds is thrown from HDFS block local readers.

Re: Tez IndexOutOfBoundsException in select distinct

Rising Star

Ignoring the actual backtrace (which is a bug), I have seen issues with uncompressed text tables in Tez related to Hive's use of Hadoop-1 APIs.

Try re-running with

set mapreduce.input.fileinputformat.split.minsize=67108864;
or alternatively, compress the files before loading with gzip with something like this https://gist.github.com/t3rmin4t0r/49e391eab4fbdfdc8ce1
Highlighted

Re: Tez IndexOutOfBoundsException in select distinct

New Contributor

Hi Davide,

setting

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat is a good workaround.

;) Bye

Re: Tez IndexOutOfBoundsException in select distinct

New Contributor

sort merge join to false worked fine for me.

hive.auto.convert.sortmerge.join=false

--Pravat Sutar

Don't have an account?
Coming from Hortonworks? Activate your account here