Support Questions

antonyshajin · ‎12-03-2016

I came across a situation when inserting data into hive table from another table. The query was processed using two MR jobs. one got successful and another failed. I could see, few records are inserted into the target table. It was obvious to me since there were two MR jobs processed independently and it is not transactional based.

I am trying to understand what happens if the same occurs while inserting data into Hive using Spark. If one of the executor/task fails and it reached retry limit, will it completely terminate the job or partial data get inserted into the table?

Thanks in advance.

rajkumar_singh · ‎12-03-2016

so what I understand your problem is your hive insert query spin two stages processed with 2 MR job in which last job failed result into the inconsistent data into the destination table. spark job also consist of stages but there is lineage in stages so if one of stage got failed after retrying executor retried attempt then your complete job will fail.

View solution in original post

rajkumar_singh · ‎12-03-2016

so what I understand your problem is your hive insert query spin two stages processed with 2 MR job in which last job failed result into the inconsistent data into the destination table. spark job also consist of stages but there is lineage in stages so if one of stage got failed after retrying executor retried attempt then your complete job will fail.

antonyshajin · ‎12-03-2016

Thanks @Rajkumar Singh

Cloudera Community

Support Questions

What happens if one of the Spark task fails while inserting data into Hive

Hive insert query optimization

issue while inserting data into hive through spark

Insert values in Array data type - Hive

Streamlining Data Processing with Spark HBase Inte...

Spark Text Analytics - Uncovering Data-Driven Topi...

Data Ingest with Apache Zeppelin + Apache Spark 1....

hive insert query is failing with different except...

Hive create external table fails to load data

Apache Zeppelin (Hive & Spark Demo)

Spark to read the Hive tables under information_sc...