Support Questions
Find answers, ask questions, and share your expertise

Inserts into a bucketed table fail randomly with Hive on Tez

New Contributor

The MAP phase for Inserts into a bucketed table randomly fails with the error "Vertex <vertex_id> [Map 1] failed as task <task_id> failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0".

The task fails because it fails for all attempts with "<attempt_id> being failed for too many output errors. failureFraction=0.2, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0"

This happens more often if the table is ACID enabled and a delete operation is performed before the inserts.

I have tried the following:

  • Changed, tez.task.launch.cmd-opts and to use parallel GC.
  • tez.runtime.shuffle.max.allowed.failed.fetch.fraction = 0.95
  • tez.runtime.shuffle.failed.check.since-last.completion=false
  • tez.runtime.shuffle.fetch.buffer.percent = 0.1
  • tez.runtime.shuffle.memory.limit.percent = 0.25
  • tez.runtime.shuffle.ssl.enable=false
  • Deleted ".../usercache/<user>/filecache" and ".../usercache/<user>/appcache"

Please advise as to what might be a solution and if anyone else is able to successfully run large number of inserts on a bucketed table via Tez.

@Namit Maheshwari @kerra @Deepesh @Ram Baskaran @Sindhu



Hi Anant,

One quick thing to check is to see if there are no delta files in the HDFS folders. If there are, please run compaction and make sure they are converted to base files before trying again.

Hope that helps.



New Contributor

@kerra Thanks for the response. I do recreate the table before every bunch of operations and it is reproducible. Running compaction after every insert would be quite impractical. I have raised in hopes of getting some solution.