Created 12-03-2016 05:29 PM
I came across a situation when inserting data into hive table from another table. The query was processed using two MR jobs. one got successful and another failed. I could see, few records are inserted into the target table. It was obvious to me since there were two MR jobs processed independently and it is not transactional based.
I am trying to understand what happens if the same occurs while inserting data into Hive using Spark. If one of the executor/task fails and it reached retry limit, will it completely terminate the job or partial data get inserted into the table?
Thanks in advance.
Created 12-03-2016 05:35 PM
so what I understand your problem is your hive insert query spin two stages processed with 2 MR job in which last job failed result into the inconsistent data into the destination table. spark job also consist of stages but there is lineage in stages so if one of stage got failed after retrying executor retried attempt then your complete job will fail.
Created 12-03-2016 05:35 PM
so what I understand your problem is your hive insert query spin two stages processed with 2 MR job in which last job failed result into the inconsistent data into the destination table. spark job also consist of stages but there is lineage in stages so if one of stage got failed after retrying executor retried attempt then your complete job will fail.
Created 12-03-2016 05:40 PM
Thanks @Rajkumar Singh