Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What happens if one of the Spark task fails while inserting data into Hive

avatar
Contributor

I came across a situation when inserting data into hive table from another table. The query was processed using two MR jobs. one got successful and another failed. I could see, few records are inserted into the target table. It was obvious to me since there were two MR jobs processed independently and it is not transactional based.

I am trying to understand what happens if the same occurs while inserting data into Hive using Spark. If one of the executor/task fails and it reached retry limit, will it completely terminate the job or partial data get inserted into the table?

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Guru

so what I understand your problem is your hive insert query spin two stages processed with 2 MR job in which last job failed result into the inconsistent data into the destination table. spark job also consist of stages but there is lineage in stages so if one of stage got failed after retrying executor retried attempt then your complete job will fail.

View solution in original post

2 REPLIES 2

avatar
Super Guru

so what I understand your problem is your hive insert query spin two stages processed with 2 MR job in which last job failed result into the inconsistent data into the destination table. spark job also consist of stages but there is lineage in stages so if one of stage got failed after retrying executor retried attempt then your complete job will fail.

avatar
Contributor

Thanks @Rajkumar Singh