The MAP phase for Inserts into a bucketed table randomly fails with the error "Vertex <vertex_id> [Map 1] failed as task <task_id> failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0".
The task fails because it fails for all attempts with "<attempt_id> being failed for too many output errors. failureFraction=0.2, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0"
This happens more often if the table is ACID enabled and a delete operation is performed before the inserts.
I have tried the following:
Please advise as to what might be a solution and if anyone else is able to successfully run large number of inserts on a bucketed table via Tez.
One quick thing to check is to see if there are no delta files in the HDFS folders. If there are, please run compaction and make sure they are converted to base files before trying again.
Hope that helps.