Support Questions

Find answers, ask questions, and share your expertise

Which is better for performance? hive 3 or hbase?

avatar

I want to save events through Kafka into Hadoop ecosystem and also retrieve them with low latency. I can use hive3 or hbase. Which one should I use?

1 ACCEPTED SOLUTION

avatar
Super Guru
@ Vinit_Kratin,

If low response time is a must, then Hive is not an option, Impala should be faster.

HBase designed tables require a row key, which means, all data you want to retrieve will need this key to be queried against HBase. So for your event based data, if you need to use an event id to store and retrieve data, then hbase might be suitable.

I still suggest you to go through details of differences between HBase and Hive and Impala to get a better understanding. Google is your friend.

Cheers
Eric

View solution in original post

4 REPLIES 4

avatar
Super Guru
@Vinit_Kratin,

You can't compare Hive with HBase directly, it is really use case dependent. It depends on how you want to store you data, how you will query your data, and whether you need to delete data etc and etc.

You will need to explain a bit more on what you want to achieve before an answer can be given as to which is better, hive or hbase.

Doing a quick google search returned me below useful links:
https://community.cloudera.com/t5/Support-Questions/When-to-use-Hive-and-Hbase/m-p/206167
https://community.cloudera.com/t5/Community-Articles/The-Differences-between-Pig-Hive-and-HBase/ta-p...

https://www.dezyre.com/article/hive-vs-hbase-different-technologies-that-work-better-together/322

Cheers
Eric

avatar
Thanks a lot for your reply.
What I want to do is store events in the database.
These events are structured and needed to be queried multiple times.
Deletion is not required. low response time is a must.
Please let me know if I need to give you any more information.

avatar

@EricL 

Thanks a lot for your reply.
What I want to do is store events in the database.
These events are structured and needed to be queried multiple times.
Deletion is not required. low response time is a must.
Please let me know if I need to give you any more information.

avatar
Super Guru
@ Vinit_Kratin,

If low response time is a must, then Hive is not an option, Impala should be faster.

HBase designed tables require a row key, which means, all data you want to retrieve will need this key to be queried against HBase. So for your event based data, if you need to use an event id to store and retrieve data, then hbase might be suitable.

I still suggest you to go through details of differences between HBase and Hive and Impala to get a better understanding. Google is your friend.

Cheers
Eric