Created 10-04-2016 01:36 PM
Hi, I'm trying to load simple dataset into HBase using Pig script. I have referred few websites but some are using org.apache.pig.backend.hadoop.hbase.HBaseStorage and in som website they used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Can someone please let me know which is the correct method and what is the difference between these two.?
Created 10-04-2016 10:48 PM
@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:
-- Create a hive-managed HBase table CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2") TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable") ; -- Insert data into it INSERT INTO TABLE MyHBaseTable SELECT SourceKey, SourceCol1, SourceCol2 FROM SourceHiveTable ;
And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:
pig -useHCatalog -f script.pig
Where script.pig is as below
RawData = LOAD 'SourceHiveTable' USING org.apache.hive.hcatalog.pig.HCatLoader(); KeepColumns = FOREACH RawData GENERATE SourceKey, SourceCol1, SourceCol2; STORE KeepColumns INTO 'hbase://MyNamespace:MyTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');
Note: You don't specify the key in the STORE statement - the first column is always the key.
Hope this helps!
Created 10-04-2016 02:31 PM
@Mahesh Mallikarjunappa When you use pig to load into hbase use org.apache.pig.backend.hadoop.hbase.HBaseStorage, when you use hive to load into hbase use used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Both are for those specific technolgoies.
Created 10-04-2016 10:48 PM
@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:
-- Create a hive-managed HBase table CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2") TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable") ; -- Insert data into it INSERT INTO TABLE MyHBaseTable SELECT SourceKey, SourceCol1, SourceCol2 FROM SourceHiveTable ;
And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:
pig -useHCatalog -f script.pig
Where script.pig is as below
RawData = LOAD 'SourceHiveTable' USING org.apache.hive.hcatalog.pig.HCatLoader(); KeepColumns = FOREACH RawData GENERATE SourceKey, SourceCol1, SourceCol2; STORE KeepColumns INTO 'hbase://MyNamespace:MyTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');
Note: You don't specify the key in the STORE statement - the first column is always the key.
Hope this helps!
Created 10-05-2016 01:41 PM
TBLPROPERTIES("hbase.table.name"="MyNamespace:MyTable")
I am getting "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.NamespaceNotFoundException: org.apache.hadoop.hbase.NamespaceNotFoundException: MyNamespace" Please help us here
Created 07-03-2018 05:25 AM
In hbase, create a namespace and then create a table to avoid this error.
Created 10-05-2016 02:25 PM
@Amit Dass, first create table in Hive inside database and that table name should match with TBLPROPERTIES table name.