- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
CSV file causing Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
- Labels:
-
Apache Hive
Created ‎05-02-2016 12:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have an external partitioned table and the partition is based on 4 CSV files of < 2GB each. One file causes the problem as soon as added to the partition directory.
Splitting the file into two equal parts (based on row count 4071510/2) using the command line "split" command solves the problem. I am completely unable to figure out what the problem is. Here's the stack trace:
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1461850869883_0002_6_00, diagnostics=[Task failed, taskId=task_1461850869883_0002_6_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException
Created ‎05-04-2016 11:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?
Created ‎05-02-2016 02:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hive doesn't have the limit on the max row count. You said its external partitioned table, did you add partitions using
- MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
Created ‎05-02-2016 02:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, I added the partition using:
ALTER TABLE mytable ADD PARTITION (partitioncolumn="2016-04-30") LOCATION '/user/data/partitioncolumn=2016-04-30'
By the way, splitting the file into two and adding to the same partition folder works. When I put the complete file in the folder, it always gave me this exception.
Created ‎05-04-2016 11:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Ammar Rizvi Can you try setting hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and then execute the same query?
Created ‎05-10-2016 03:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Worked like a charm! Could you please also leave a short comment so that I could understand what was happening and how this setting fixed it?
Much appreciated!
