04-11-2017 05:34 PM
Can we discuss architecture for a datawarehouse solution.
We have sales datamart. Containing sales fact, Product Dim, Customer Dim.
If we produce a sample Bi report using Hadoop , revenue by each customer per day.
If we go by Hadoop, where to store the data. Do we need to store in Hive or Impala.
Again how to do Incremental Loading to Hive table. Do we need to do delete whole data and reinsert all.
04-11-2017 07:27 PM
The data will be stored in HDFS (if you are not using kudu). The purpose of Hive or Imapal is not to store data, it is just a layer on of HDFS to query the data. You can use either Hive or Impala on HDFS data, it is your choice based on your need.
You can use incremental import in sqoop
04-11-2017 09:00 PM
For analytics , if we don't build a hive or Impala, how the people will query the Data from HDFS.
Again if we do sqoop Incremental import, how will it Update the Hive/Imala table if hive files stored other format than Sqoop import structure.
I am looking to get some suggestions here.