- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Loading data into HBase using Pig script
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache Pig
Created ‎10-04-2016 01:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm trying to load simple dataset into HBase using Pig script. I have referred few websites but some are using org.apache.pig.backend.hadoop.hbase.HBaseStorage and in som website they used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Can someone please let me know which is the correct method and what is the difference between these two.?
Created ‎10-04-2016 10:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:
-- Create a hive-managed HBase table CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2") TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable") ; -- Insert data into it INSERT INTO TABLE MyHBaseTable SELECT SourceKey, SourceCol1, SourceCol2 FROM SourceHiveTable ;
And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:
pig -useHCatalog -f script.pig
Where script.pig is as below
RawData = LOAD 'SourceHiveTable' USING org.apache.hive.hcatalog.pig.HCatLoader(); KeepColumns = FOREACH RawData GENERATE SourceKey, SourceCol1, SourceCol2; STORE KeepColumns INTO 'hbase://MyNamespace:MyTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');
Note: You don't specify the key in the STORE statement - the first column is always the key.
Hope this helps!
Created ‎10-04-2016 02:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mahesh Mallikarjunappa When you use pig to load into hbase use org.apache.pig.backend.hadoop.hbase.HBaseStorage, when you use hive to load into hbase use used org.apache.hadoop.hive.hbase.HBaseStorageHandler. Both are for those specific technolgoies.
Created ‎10-04-2016 10:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows:
-- Create a hive-managed HBase table CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2") TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable") ; -- Insert data into it INSERT INTO TABLE MyHBaseTable SELECT SourceKey, SourceCol1, SourceCol2 FROM SourceHiveTable ;
And from Pig, you can read form that same source and write to that same target using the HBase serde as follows:
pig -useHCatalog -f script.pig
Where script.pig is as below
RawData = LOAD 'SourceHiveTable' USING org.apache.hive.hcatalog.pig.HCatLoader(); KeepColumns = FOREACH RawData GENERATE SourceKey, SourceCol1, SourceCol2; STORE KeepColumns INTO 'hbase://MyNamespace:MyTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2');
Note: You don't specify the key in the STORE statement - the first column is always the key.
Hope this helps!
Created ‎10-05-2016 01:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TBLPROPERTIES("hbase.table.name"="MyNamespace:MyTable")
I am getting "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.NamespaceNotFoundException: org.apache.hadoop.hbase.NamespaceNotFoundException: MyNamespace" Please help us here
Created ‎07-03-2018 05:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In hbase, create a namespace and then create a table to avoid this error.
Created ‎10-05-2016 02:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Amit Dass, first create table in Hive inside database and that table name should match with TBLPROPERTIES table name.
