Created 11-12-2015 03:56 PM
I have batch data stored in Hive and realtime streaming data stored in HBase. I would like to create a view in Hive which joins a table in Hive with data in HBase. Using Hive on HBase is extremely slow. Is there a better way to accomplish this?
Created 11-12-2015 04:00 PM
Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.
Created 11-12-2015 04:00 PM
Sort Bucket the Hive table and Read the bucketed Hive table in Mapreduce program and hit Hbase when the Key changes. Requires programming effor, but very effective. Bucketing the Hive table will make sure that a particular key goes to only one bucket, so you hit Hbase Once for a particular key.
Created 11-12-2015 07:33 PM
@hrongali@hortonworks.com I think a hive UDF could implement same logic, but would be easier to consume than map-reduce program. I think this UDF from brickhouse do this:
https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/hbase/CachedGetUDF.java