Support Questions

Find answers, ask questions, and share your expertise

hive incremental updates

avatar
Contributor

I wanted to get the suggestion on the incremental strategy for tables be implemented : We have set of source table which are getting refreshed on the daily basis in the source ( DB2 ) and we need to refresh then in hive db as well, which approach will you suggest.

Source table have new inserts as well as updates to existing records;

1) approach 1: USe Hbase to store the data since updates are allowed and build hive external table referring to the same I doubt if this will affect queries using the joins for hive-hbase table with large ORC hive tables?

2) approach 2 : USe 4 step incremental table approach suggested by HDP ? https://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/

1 ACCEPTED SOLUTION

avatar

Hi @Abhijeet Rajput,

Previous to HDP 2.6 you'll need to use the solution outlined in #2. HDP 2.6 includes Hive MERGE so you can now create a staging table and execute a MERGE statement against an ACID enabled table. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge

View solution in original post

1 REPLY 1

avatar

Hi @Abhijeet Rajput,

Previous to HDP 2.6 you'll need to use the solution outlined in #2. HDP 2.6 includes Hive MERGE so you can now create a staging table and execute a MERGE statement against an ACID enabled table. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge