Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

What is the best, most performant, method to join a Hive table with data in Hbase?

avatar

I have batch data stored in Hive and realtime streaming data stored in HBase. I would like to create a view in Hive which joins a table in Hive with data in HBase. Using Hive on HBase is extremely slow. Is there a better way to accomplish this?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.

avatar

@hrongali@hortonworks.com I think a hive UDF could implement same logic, but would be easier to consume than map-reduce program. I think this UDF from brickhouse do this:

https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/hbase/CachedGetUDF.java