Support Questions

ammar_rizvi · ‎05-02-2016

Hi,

I have an external partitioned table and the partition is based on 4 CSV files of < 2GB each. One file causes the problem as soon as added to the partition directory.

Splitting the file into two equal parts (based on row count 4071510/2) using the command line "split" command solves the problem. I am completely unable to figure out what the problem is. Here's the stack trace:

Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1461850869883_0002_6_00, diagnostics=[Task failed, taskId=task_1461850869883_0002_6_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException

pardeep_kumar · ‎05-04-2016

@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?

View solution in original post

rnalakurthi · ‎05-02-2016

Hive doesn't have the limit on the max row count. You said its external partitioned table, did you add partitions using

MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)

ammar_rizvi · ‎05-02-2016

No, I added the partition using:

ALTER TABLE mytable
ADD PARTITION (partitioncolumn="2016-04-30")
LOCATION '/user/data/partitioncolumn=2016-04-30'

By the way, splitting the file into two and adding to the same partition folder works. When I put the complete file in the folder, it always gave me this exception.

pardeep_kumar · ‎05-04-2016

@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?

ammar_rizvi · ‎05-10-2016

Worked like a charm! Could you please also leave a short comment so that I could understand what was happening and how this setting fixed it?

Much appreciated!

Cloudera Community

Support Questions

CSV file causing Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask