Support Questions
Find answers, ask questions, and share your expertise

Scenario when we store data in HBase and access thru Hive external table?

Hi ,

I am new to NoSQL so pls help me to understand. I have noticed many people were discussing about Accessing the data thru Hive external table where the data actually got stored in HBase. I just want to understand why don’t we directly store data in Hive. There should be some reason for them to store in HBASE.

Earlier hive don’t support ACID property but this can be very well achieved in Hive itself.

1 ACCEPTED SOLUTION

Super Collaborator

There might be an application storing the data already in Hbase and other people like to query this data in an sql manner, or want to combine it with data from other Hive tables.

it is also possible that the amount of data getting inserted or updated is an argument for using Hbase. In principal Hbase has some features to handle high amounts of data pretty fast with memory based processing, while Hive itself is a SQL layer, using other storage engines, resulting in the data being stored one or the other way in hdfs (or whatever your storage system is). Hbase also uses hdfs as the persistence layer, but the data inserted is available for queries even before the write operation to disk takes place.

So a typical use case is that data is inserted and updated online in Hbase, while someone needs to combine that data with other data in SQL queries. I think it is much less usual to insert and update Hbase tables only via Hive, but reasons could be very different anyway, i.e. the policies by the ops team, know-how of involved people, a cluster having evolved using different tools, established dev or ops procedure etc...

View solution in original post

3 REPLIES 3

Super Collaborator

There might be an application storing the data already in Hbase and other people like to query this data in an sql manner, or want to combine it with data from other Hive tables.

it is also possible that the amount of data getting inserted or updated is an argument for using Hbase. In principal Hbase has some features to handle high amounts of data pretty fast with memory based processing, while Hive itself is a SQL layer, using other storage engines, resulting in the data being stored one or the other way in hdfs (or whatever your storage system is). Hbase also uses hdfs as the persistence layer, but the data inserted is available for queries even before the write operation to disk takes place.

So a typical use case is that data is inserted and updated online in Hbase, while someone needs to combine that data with other data in SQL queries. I think it is much less usual to insert and update Hbase tables only via Hive, but reasons could be very different anyway, i.e. the policies by the ops team, know-how of involved people, a cluster having evolved using different tools, established dev or ops procedure etc...

@Harald Berghoff

Thank you for the explanation. I got the idea when and where HBASE will be used with Hive.

Super Collaborator

Would be great if you 'accept' the answer if you consider it helpful.

; ;