Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive and Hbase : Which is best in terms of storage

avatar
Expert Contributor

As both data resides on hdfs . So which is best in terms of Storage / Memory etc.

1 ACCEPTED SOLUTION

avatar
Super Guru

Apache Hive and Apache HBase are fundamentally different systems with completely different architectures. As such, which is most efficient really depends on the application use cases. It's impossible to generically state that Hive or HBase is better/worse than the other and the fact that they both use HDFS for storing data is irrelevant.

Please quantify the application requirements you have if you'd like an answer about whether Hive or HBase are better for you.

View solution in original post

2 REPLIES 2

avatar
Super Guru

Apache Hive and Apache HBase are fundamentally different systems with completely different architectures. As such, which is most efficient really depends on the application use cases. It's impossible to generically state that Hive or HBase is better/worse than the other and the fact that they both use HDFS for storing data is irrelevant.

Please quantify the application requirements you have if you'd like an answer about whether Hive or HBase are better for you.

avatar
Master Guru

I completely agree with josh. hive is best suite for EDW type of querying. HBase is a key value store, so you need to know your questions prior to designing the PDM. Both use HDFS as the underlying storage. If you which queries will be run and have a defined access path model, Phoenix/hbase will provide you lowest latency. If you are looking for general BI queries and can't define access path up front, hive is the way to go.