Created 03-30-2016 01:49 PM
(1)I have created hive external table and data coming from netezza to hdfs
(2) everyday we have to incremental apend to this table as well as if any data is changed to base table then what to do?
I can apend as much I want but if any changed happen on base table for example few raw changed or few column changed then how I can increment works for me?
everyday I have to take same table from netezza and hdfs. most probably we can apend with data_date.
Created 03-30-2016 02:09 PM
The below link should cover your requirements. It shows a strategy for incremental updates/ingest. It also covers the scenario where base data may change:
http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
Created 03-30-2016 02:09 PM
The below link should cover your requirements. It shows a strategy for incremental updates/ingest. It also covers the scenario where base data may change:
http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
Created 03-30-2016 02:43 PM
1) New data is added
You can import data using Sqoop or Netezza loading unloading functions Sqoop provides delta loading by timestamp or id column ( any column that increments continuously
2) Old data is changed
Bigger problem, Hive has transactions but it is still very new.
2.1 Changed small dimension tables
A good approach is to just reload them. As long as they fall under a couple GB and you have a nightly period to do it.
2.2 Changes to big fact tables
Bigger problem.
- You can use Hive ACID transactions but as said they are still new
- Alternatively you would have to use a manual approach like adding a version column to your table and run your queries in a way that they use the newest one.
- Last possibility is to load the delta changes and then merge them into the existing table in HAdoop. While loading TB of data into a hadoop cluster can be a bottleneck re creating a table like that by joining old with new data is very fast since it is running in parallel in the cluster.
Created 03-30-2016 06:45 PM
Thanks a lot