1.Create a hive-managed HBASE table CREATE TABLE MyHBaseTable(MyKey string, Col1 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1")TBLPROPERTIES("hbase.table.name" = "t2");
where MyHBASETABLE - Creating a hive table "hbase.table.name" = "t2" - t2 is the HBASE table (new table -auto create)
2. INSERT INTO TABLE MyHBaseTable SELECT eid, name FROM employee_100;
Doubt : Above command is overwriting the data available in MyHBaseTable . But I am expecting append the data . Please help here if you have gone thru with this issue.
You have defined the rowkey in HBase to be the Hive column "MyKey". If you want new rows, make sure that you use a unique rowkey.
@Amit Dass HBase will not store duplicate keys. If you do repeated INSERT INTO SELECT FROM statements, you will simply overwrite your data. You can, however, increase the number of versions of records that HBase stores. To keep, 5 versions of your data, do:
alter 't2', NAME => 'colfam', VERSIONS => 5
You can, but from Hive you will still see only the latest version: there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp, a quote from here.