Support Questions

Find answers, ask questions, and share your expertise

What is the best, most performant, method to join a Hive table with data in Hbase?

avatar

I have batch data stored in Hive and realtime streaming data stored in HBase. I would like to create a view in Hive which joins a table in Hive with data in HBase. Using Hive on HBase is extremely slow. Is there a better way to accomplish this?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.

avatar

@hrongali@hortonworks.com I think a hive UDF could implement same logic, but would be easier to consume than map-reduce program. I think this UDF from brickhouse do this:

https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/hbase/CachedGetUDF.java