Reply
Explorer
Posts: 29
Registered: ‎01-20-2017

How HBASE is used in Datawarehous architecture

Hi,

   I want to understand how HBASE is used while loading to warhouse tables.

   HBASE is basicaly for retrieving smaller numebr of records. But in case of Business intelligece application

   when the data is analysed with high volume of data, How HBASE is usefull. Is hbase used for staging area or usefull for reporting application ?

 

 

Highlighted
Cloudera Employee
Posts: 4
Registered: ‎03-28-2017

Re: How HBASE is used in Datawarehous architecture

HBase is designed to have high throughput read and write capabilities. It is much more commonly used as a data backend for online applications. You are correct that for data warehousing you are optimizing for long scans/sequential reads. The most common design is to use HDFS to store your data and Impala to query it. One catch here is that it can be difficult to update a record if it changes. Cloudera has developed and open sourced Kudu to simultaneously allow fast long scans of data and allow for easy updating of records.

 

For data warehousing, HDFS or Kudu for storage and Impala for querying is recommended. HBase is designed for a different use case and data access pattern.

Explorer
Posts: 29
Registered: ‎01-20-2017

Re: How HBASE is used in Datawarehous architecture

I see Hive is also used for query purpose. But Hive does map reduce which is slow.  

I don't know difference between Hive and Impala.

What is the architecture we need to follow in that case.

Also for incremental loading , it looks like we have cleared the table and rebulid the whole table.

So what is the architecture we need to follow for a data warehouse or to build a business intellingence 

application where we are are analzying  mass volume of data. 

 

 

Announcements