Reply
Explorer
Posts: 27
Registered: ‎01-20-2017

How to build a Data warehouse using Hadoop Cluster

Can we discuss architecture for a datawarehouse solution.

 

We have sales datamart. Containing sales fact, Product Dim, Customer Dim.

 

If we produce a sample Bi report using Hadoop , revenue by each customer per day.

 

If we go by Hadoop, where to store the data. Do we need to store in Hive or Impala.

Again how to do Incremental Loading to Hive table. Do we need to do delete whole data and reinsert all.

 

Thanks

 

 

 

 

Posts: 235
Topics: 11
Kudos: 36
Solutions: 22
Registered: ‎09-02-2016

Re: How to build a Data warehouse using Hadoop Cluster

@dmishraoc

 

The data will be stored in HDFS (if you are not using kudu). The purpose of Hive or Imapal is not to store data, it is just a layer on of HDFS to query the data. You can use either Hive or Impala on HDFS data, it is your choice based on your need.

 

You can use incremental import in sqoop

https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html

 

Highlighted
Explorer
Posts: 27
Registered: ‎01-20-2017

Re: How to build a Data warehouse using Hadoop Cluster

For analytics , if we don't build a hive or Impala, how the people will query the Data from HDFS.

Again if we do sqoop Incremental import, how will it Update the Hive/Imala table if hive files stored other format than Sqoop import structure.

 

I am looking to get some suggestions here.  

 

 

 

 

Announcements