Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CSV file causing Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

avatar
Explorer

Hi,

I have an external partitioned table and the partition is based on 4 CSV files of < 2GB each. One file causes the problem as soon as added to the partition directory.

Splitting the file into two equal parts (based on row count 4071510/2) using the command line "split" command solves the problem. I am completely unable to figure out what the problem is. Here's the stack trace:

Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1461850869883_0002_6_00, diagnostics=[Task failed, taskId=task_1461850869883_0002_6_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException
1 ACCEPTED SOLUTION

avatar

@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?

View solution in original post

4 REPLIES 4

avatar
Explorer

Hive doesn't have the limit on the max row count. You said its external partitioned table, did you add partitions using

  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)

avatar
Explorer

No, I added the partition using:

ALTER TABLE mytable
ADD PARTITION (partitioncolumn="2016-04-30")
LOCATION '/user/data/partitioncolumn=2016-04-30'

By the way, splitting the file into two and adding to the same partition folder works. When I put the complete file in the folder, it always gave me this exception.

avatar

@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?

avatar
Explorer

Worked like a charm! Could you please also leave a short comment so that I could understand what was happening and how this setting fixed it?

Much appreciated!