Archives of Support Questions (Read Only)

SQLShaw · ‎11-12-2015

I have batch data stored in Hive and realtime streaming data stored in HBase. I would like to create a view in Hive which joins a table in Hive with data in HBase. Using Hive on HBase is extremely slow. Is there a better way to accomplish this?

hrongali · ‎11-12-2015

Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.

View solution in original post

hrongali · ‎11-12-2015

Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.

gbraccialli3 · ‎11-12-2015

@[email protected] I think a hive UDF could implement same logic, but would be easier to consume than map-reduce program. I think this UDF from brickhouse do this:

https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/hbase/CachedGetUDF.java

Cloudera Community

Archives of Support Questions (Read Only)

What is the best, most performant, method to join a Hive table with data in Hbase?