Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase Hive Integration

HBase Hive Integration

New Contributor

I have created a HBase by mentioning the default versions as 5

 

create 'tablename',{NAME => 'cf', VERSIONS => 5}


and inserted two rows(row1 and row2)

 

put 'tablename','row1','cf:id','row1id'
put 'tablename','row1','cf:name','row1name'
put 'tablename','row2','cf:id','row2id'
put 'tablename','row2','cf:name','row2name'
put 'tablename','row2','cf:name','row2nameupdate'
put 'tablename','row2','cf:name','row2nameupdateagain'

 

Tried to select the data by using scan and I'm getting the latest updated data.
and when I tried to select the different versions data by using the below command I got the different versions data.

 

scan 'tablename',{RAW => true, VERSIONS => 5}

Now created a Hive External table to point to this HBase table

 

CREATE EXTERNAL TABLE hive_timestampupdate(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:name")
TBLPROPERTIES ("hbase.table.name" = "tablename");
select * from hive_timestampupdate;

When I queried the table hive_timestampupdate, I'm able to see the data.

By default here I'm getting the latest updated data based on timestamp.
Here also I want to query the data of different versions.

**Hive command that will fetch the different versions data of HBase.**

Thanks in Advance.

1 REPLY 1

Re: HBase Hive Integration

Master Guru
This is currently not possible to perform in Hive. The record reader builds scans only with the latest version (default scan) [1]

[1] - https://github.com/cloudera/hive/blob/cdh5.4.0-release/hbase-handler/src/java/org/apache/hadoop/hive...
Don't have an account?
Coming from Hortonworks? Activate your account here