Support Questions

Find answers, ask questions, and share your expertise

STORE Pig OUTPUT into MULTIPLE HBase TABLES

avatar
Expert Contributor

Hi All,

How can we store the output of pig into multiple hbase tables. Hbase tables are already created, need to store the each specific value into specific table.

For EX:

I have got the output as

(word1){data}
(word2){data}
(word3){data}
(word4){data}

So I need to store output into already created tables. Table Names are like

word1
word2
word3
word4

Now output should be store in already created tables as

word1 ----> (word1){data}
word2 ----> (word2){data} 
word3 ----> (word3){data}           

Any suggestions.

thank you.

1 ACCEPTED SOLUTION

avatar
Master Mentor

you would need to assign an alias to each row and specify separate store command per row.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

you would need to assign an alias to each row and specify separate store command per row.

avatar
Expert Contributor

thanks for your reply Artem Ervits.

can you please give me an example for that.It will be so helpfull for me.

avatar
Master Mentor

@Mohan V this is not efficient but does what you're asking

grunt> fs -cat text
1 a
2 b
3 c
grunt> data = load 'text' using PigStorage(' ') AS (id:long, letter:chararray);
grunt> A = FILTER data by letter == 'a';
grunt> B = FILTER data by letter == 'b';
grunt> C = FILTER data by letter == 'c';
grunt> STORE A into 'hbase://a' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter');
2016-09-07 16:04:29,421 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 698875904 to monitor. collectionUsageThreshold = 489213120, usageThreshold = 489213120
...
grunt> STORE B into 'hbase://b' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter');
...
grunt> STORE C into 'hbase://c' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter');
 now in hbase shell assuming tables were created
create 'a', 'cf'
create 'b', 'cf'
create 'c', 'cf'
hbase(main):001:0> scan 'a'
ROW                            COLUMN+CELL
 1                             column=cf:letter, timestamp=1473264279802, value=a
1 row(s) in 0.2610 seconds


hbase(main):002:0> scan 'b'
ROW                            COLUMN+CELL
 2                             column=cf:letter, timestamp=1473264324881, value=b
1 row(s) in 0.0160 seconds


hbase(main):003:0> scan 'c'
ROW                            COLUMN+CELL
 3                             column=cf:letter, timestamp=1473264429688, value=c
1 row(s) in 0.0140 seconds

avatar
Expert Contributor

@Artem Ervits thanks for your valuable explanation.

By using that i have tried it in another way.

I.e without storing the output to a text file and again loading back by using pigstorage, before itself i have tried to filter based on word and tried to store it in hbase.

Above I have mentioned only the scenario what i need.but here is the actual script and data that i have used.

Output & Script:

A = foreach (group epoch BY epochtime) { data = foreach epoch generate created_at,id,user_id,text; generate group as pattern, data; }

By using this I got the below output

(word1_1473344765_265217609700,{(Wed Apr 20 07:23:20 +0000 2016,252479809098223616,450990391,rt @joey7barton: ..give a word1 about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ...),(Wed Apr 22 07:23:20 +0000 2016,252455630361747457,118179886,@dawnriseth word1 and then we will have to prove it again by reelecting obama in 2016, 2020... this race-baiting never ends.)}) 
(word2_1473344765_265217609700,{(Wed Apr 21 07:23:20 +0000 2016,252370526411051008,845912316,@maarionymcmb word2 mere ta dit tu va resté chez toi dnc tu restes !),(Wed Apr 23 07:23:20 +0000 2016,252213169567711232,14596856,rt @chernynkaya: "have you noticed lately that word2 is getting credit for the president being in the lead except pres. obama?"  ...)})

Now without dump or storing it into a file, I tried this.

B = FILTER A BY pattern = 'word1_1473325383_265214120940';
describe B;

B: {pattern: chararray,data: {(json::created_at: chararray,json::id: chararray,json::user_id: chararray,json::text: chararray)}}

STORE B into 'hbase://word1_1473325383_265214120940' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:data');

Output given as success but there is no data stored into table.When I checked the logs below is the warning.

2016-09-08 19:45:46,223 [Readahead Thread #2] WARN org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile EBADF: Bad file descriptor

Please don't hesitate to suggest me what I am missing here.

thank you.