Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Update and Inserts

Update and Inserts

Explorer

We are receiving Hourly JSON data into HDFS. The size of the data would be 7GB per hour .

  • when matched record found on the final table then Update (or) Delete
  • if the record not matched in the final dataset then insert the record.

What is best way to do Upserts(Updates and Inserts in Hadoop) for large dataset. Hive or HBase or Nifi . What is flow . Can anyone help us in the flow .updates.txt

3 REPLIES 3
Highlighted

Re: Update and Inserts

Expert Contributor
Highlighted

Re: Update and Inserts

Explorer

@Shu When using for larger dataset , The merge is taking longer time to complete . Final Table is having 150GB added every day . So Scanning the final table and adding updates is really taking more than an hour . Any other alternative approach .

Highlighted

Re: Update and Inserts

New Contributor

@Eugene Koifman We are facing an issue which seems to be a limitation of Hive 1.2 ACID tables. We are using MERGE for loading mutable data on Hive ACID tables but loading/Reading these ACID tables using Pig or using Spark seems to be an issue .

Does Hive ACID table for Hive version 1.2 posses the capability of being read into Apache Pig using HCatLoader (or other means) or in Spark using SQLContext(or other means).

For Spark, it seems it is only possible to read ACID tables if the table is fully compacted i.e no delta folders exist in any partition. Details in the following JIRA

https://issues.apache.org/jira/browse/SPARK-15348,

https://issues.apache.org/jira/browse/SPARK-15348

However I wanted to know if it is supported at all in Apache Pig to read ACID tables in Hive.

When I tried reading both an un-partitoned/partitioned ACID table in Pig version 0.16 I get 0 records read.

Successfully read 0 records from: "dwh.acid_table"

HDP version 2.6.5

Spark version 2.3

Pig version 0.16

Hive version 1.2

Don't have an account?
Coming from Hortonworks? Activate your account here