Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Update and Inserts


We are receiving Hourly JSON data into HDFS. The size of the data would be 7GB per hour .

  • when matched record found on the final table then Update (or) Delete
  • if the record not matched in the final dataset then insert the record.

What is best way to do Upserts(Updates and Inserts in Hadoop) for large dataset. Hive or HBase or Nifi . What is flow . Can anyone help us in the flow .updates.txt


Expert Contributor


@Shu When using for larger dataset , The merge is taking longer time to complete . Final Table is having 150GB added every day . So Scanning the final table and adding updates is really taking more than an hour . Any other alternative approach .

New Contributor

@Eugene Koifman We are facing an issue which seems to be a limitation of Hive 1.2 ACID tables. We are using MERGE for loading mutable data on Hive ACID tables but loading/Reading these ACID tables using Pig or using Spark seems to be an issue .

Does Hive ACID table for Hive version 1.2 posses the capability of being read into Apache Pig using HCatLoader (or other means) or in Spark using SQLContext(or other means).

For Spark, it seems it is only possible to read ACID tables if the table is fully compacted i.e no delta folders exist in any partition. Details in the following JIRA,

However I wanted to know if it is supported at all in Apache Pig to read ACID tables in Hive.

When I tried reading both an un-partitoned/partitioned ACID table in Pig version 0.16 I get 0 records read.

Successfully read 0 records from: "dwh.acid_table"

HDP version 2.6.5

Spark version 2.3

Pig version 0.16

Hive version 1.2

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.