Support Questions

Find answers, ask questions, and share your expertise

Loading data into HBase using Pig script

avatar
Super Collaborator

Hi, I'm trying to load simple dataset into HBase using Pig script. I have referred few websites but some are using org.apache.pig.backend.hadoop.hbase.HBaseStorage and in som website they used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Can someone please let me know which is the correct method and what is the difference between these two.?

1 ACCEPTED SOLUTION

avatar
Contributor

@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:

-- Create a hive-managed HBase table
CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2")
   TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable")
;

-- Insert data into it
INSERT INTO TABLE MyHBaseTable
   SELECT SourceKey, SourceCol1, SourceCol2
   FROM SourceHiveTable
;

And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:

pig -useHCatalog -f script.pig

Where script.pig is as below

RawData = LOAD 'SourceHiveTable'
          USING org.apache.hive.hcatalog.pig.HCatLoader();

KeepColumns = FOREACH RawData
              GENERATE SourceKey, SourceCol1, SourceCol2;

STORE KeepColumns
INTO 'hbase://MyNamespace:MyTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');

Note: You don't specify the key in the STORE statement - the first column is always the key.

Hope this helps!

View solution in original post

5 REPLIES 5

avatar
Master Guru

@Mahesh Mallikarjunappa When you use pig to load into hbase use org.apache.pig.backend.hadoop.hbase.HBaseStorage, when you use hive to load into hbase use used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Both are for those specific technolgoies.

avatar
Contributor

@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:

-- Create a hive-managed HBase table
CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2")
   TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable")
;

-- Insert data into it
INSERT INTO TABLE MyHBaseTable
   SELECT SourceKey, SourceCol1, SourceCol2
   FROM SourceHiveTable
;

And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:

pig -useHCatalog -f script.pig

Where script.pig is as below

RawData = LOAD 'SourceHiveTable'
          USING org.apache.hive.hcatalog.pig.HCatLoader();

KeepColumns = FOREACH RawData
              GENERATE SourceKey, SourceCol1, SourceCol2;

STORE KeepColumns
INTO 'hbase://MyNamespace:MyTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');

Note: You don't specify the key in the STORE statement - the first column is always the key.

Hope this helps!

avatar
Expert Contributor

TBLPROPERTIES("hbase.table.name"="MyNamespace:MyTable")

I am getting "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.NamespaceNotFoundException: org.apache.hadoop.hbase.NamespaceNotFoundException: MyNamespace" Please help us here

avatar
New Contributor

In hbase, create a namespace and then create a table to avoid this error.

avatar
Super Collaborator

@Amit Dass, first create table in Hive inside database and that table name should match with TBLPROPERTIES table name.