Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

CSV file causing Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

avatar
New Member

Hi,

I have an external partitioned table and the partition is based on 4 CSV files of < 2GB each. One file causes the problem as soon as added to the partition directory.

Splitting the file into two equal parts (based on row count 4071510/2) using the command line "split" command solves the problem. I am completely unable to figure out what the problem is. Here's the stack trace:

Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1461850869883_0002_6_00, diagnostics=[Task failed, taskId=task_1461850869883_0002_6_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException
1 ACCEPTED SOLUTION

avatar

@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?

View solution in original post

4 REPLIES 4

avatar
New Member

Hive doesn't have the limit on the max row count. You said its external partitioned table, did you add partitions using

  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)

avatar
New Member

No, I added the partition using:

ALTER TABLE mytable
ADD PARTITION (partitioncolumn="2016-04-30")
LOCATION '/user/data/partitioncolumn=2016-04-30'

By the way, splitting the file into two and adding to the same partition folder works. When I put the complete file in the folder, it always gave me this exception.

avatar

@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?

avatar
New Member

Worked like a charm! Could you please also leave a short comment so that I could understand what was happening and how this setting fixed it?

Much appreciated!