Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Loading data into HBase using Pig script

Solved Go to solution

Loading data into HBase using Pig script

Expert Contributor

Hi, I'm trying to load simple dataset into HBase using Pig script. I have referred few websites but some are using org.apache.pig.backend.hadoop.hbase.HBaseStorage and in som website they used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Can someone please let me know which is the correct method and what is the difference between these two.?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Loading data into HBase using Pig script

New Contributor

@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:

-- Create a hive-managed HBase table
CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2")
   TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable")
;

-- Insert data into it
INSERT INTO TABLE MyHBaseTable
   SELECT SourceKey, SourceCol1, SourceCol2
   FROM SourceHiveTable
;

And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:

pig -useHCatalog -f script.pig

Where script.pig is as below

RawData = LOAD 'SourceHiveTable'
          USING org.apache.hive.hcatalog.pig.HCatLoader();

KeepColumns = FOREACH RawData
              GENERATE SourceKey, SourceCol1, SourceCol2;

STORE KeepColumns
INTO 'hbase://MyNamespace:MyTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');

Note: You don't specify the key in the STORE statement - the first column is always the key.

Hope this helps!

5 REPLIES 5

Re: Loading data into HBase using Pig script

Super Guru

@Mahesh Mallikarjunappa When you use pig to load into hbase use org.apache.pig.backend.hadoop.hbase.HBaseStorage, when you use hive to load into hbase use used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Both are for those specific technolgoies.

Re: Loading data into HBase using Pig script

New Contributor

@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:

-- Create a hive-managed HBase table
CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string)
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2")
   TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable")
;

-- Insert data into it
INSERT INTO TABLE MyHBaseTable
   SELECT SourceKey, SourceCol1, SourceCol2
   FROM SourceHiveTable
;

And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:

pig -useHCatalog -f script.pig

Where script.pig is as below

RawData = LOAD 'SourceHiveTable'
          USING org.apache.hive.hcatalog.pig.HCatLoader();

KeepColumns = FOREACH RawData
              GENERATE SourceKey, SourceCol1, SourceCol2;

STORE KeepColumns
INTO 'hbase://MyNamespace:MyTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');

Note: You don't specify the key in the STORE statement - the first column is always the key.

Hope this helps!

Re: Loading data into HBase using Pig script

Expert Contributor

TBLPROPERTIES("hbase.table.name"="MyNamespace:MyTable")

I am getting "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.NamespaceNotFoundException: org.apache.hadoop.hbase.NamespaceNotFoundException: MyNamespace" Please help us here

Re: Loading data into HBase using Pig script

New Contributor

In hbase, create a namespace and then create a table to avoid this error.

Re: Loading data into HBase using Pig script

Expert Contributor

@Amit Dass, first create table in Hive inside database and that table name should match with TBLPROPERTIES table name.

Don't have an account?
Coming from Hortonworks? Activate your account here