03-28-2017 04:43 PM
I want to understand how HBASE is used while loading to warhouse tables.
HBASE is basicaly for retrieving smaller numebr of records. But in case of Business intelligece application
when the data is analysed with high volume of data, How HBASE is usefull. Is hbase used for staging area or usefull for reporting application ?
03-29-2017 08:56 AM
HBase is designed to have high throughput read and write capabilities. It is much more commonly used as a data backend for online applications. You are correct that for data warehousing you are optimizing for long scans/sequential reads. The most common design is to use HDFS to store your data and Impala to query it. One catch here is that it can be difficult to update a record if it changes. Cloudera has developed and open sourced Kudu to simultaneously allow fast long scans of data and allow for easy updating of records.
For data warehousing, HDFS or Kudu for storage and Impala for querying is recommended. HBase is designed for a different use case and data access pattern.
03-29-2017 09:41 AM
I see Hive is also used for query purpose. But Hive does map reduce which is slow.
I don't know difference between Hive and Impala.
What is the architecture we need to follow in that case.
Also for incremental loading , it looks like we have cleared the table and rebulid the whole table.
So what is the architecture we need to follow for a data warehouse or to build a business intellingence
application where we are are analzying mass volume of data.