Member since
06-03-2016
66
Posts
21
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3297 | 12-03-2016 08:51 AM | |
1769 | 09-15-2016 06:39 AM | |
1972 | 09-12-2016 01:20 PM | |
2278 | 09-11-2016 07:04 AM | |
1889 | 09-09-2016 12:19 PM |
09-12-2016
04:12 AM
thank you gkeys... You are....the best...
... View more
09-11-2016
12:48 PM
@Mohan V Very glad to see you solved it yourself by debugging -- it is the best way to learn and improve your skills 🙂
... View more
09-09-2016
12:19 PM
1 Kudo
I think i got it on my own. Actually I have forgotten the credentials and entered the wrong password. But at last its done by entering right credentials.
... View more
09-08-2016
12:51 PM
2 Kudos
@Mohan V I would:
Land the data in a landing zone in hdfs. Decide to keep this going forward or not (you may want to reuse the raw data). Then use pig scripts to transform the data into your hbase tables as tab-delimited output (see next step). Importantly, this involves inserting a key as the first column of your resulting tsv file. HBase of course is all about well-designed keys. You will use pig's CONCAT() function to create a key from existing fields. It is often useful to concatenate fields into a key with a "-" separating each field in the resulting composite key. A single tsv output will be used to bulk load a single hbase table (next step). These should be outputted to a tmp dir in hdfs to be used as input in the next step. Note: you could take your pig scripting to the next level and create a single flexible pig script for creating tsv output for all hbase tables. See https://community.hortonworks.com/content/kbentry/51884/pig-doing-yoga-how-to-build-superflexible-pig-scri.html . Not necessary though. 3. Then do a bulk import into your hbase table for each tsv. See the following links on bulk imports. (Inserting record by record will be much too slow for large tables. http://hbase.apache.org/0.94/book/arch.bulk.load.html http://hbase.apache.org/book.html#importtsv I have used this workflow frequently, including loading 2.53 billion relational records into a HBase table. The more you do it, the more automated you find yourself making it.
... View more
09-08-2016
02:37 PM
@Artem Ervits thanks for your valuable explanation. By using that i have tried it in another way. I.e without storing the output to a text file and again loading back by using pigstorage, before itself i have tried to filter based on word and tried to store it in hbase. Above I have mentioned only the scenario what i need.but here is the actual script and data that i have used. Output & Script: A = foreach (group epoch BY epochtime) { data = foreach epoch generate created_at,id,user_id,text; generate group as pattern, data; }
By using this I got the below output
(word1_1473344765_265217609700,{(Wed Apr 20 07:23:20 +0000 2016,252479809098223616,450990391,rt @joey7barton: ..give a word1 about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ...),(Wed Apr 22 07:23:20 +0000 2016,252455630361747457,118179886,@dawnriseth word1 and then we will have to prove it again by reelecting obama in 2016, 2020... this race-baiting never ends.)})
(word2_1473344765_265217609700,{(Wed Apr 21 07:23:20 +0000 2016,252370526411051008,845912316,@maarionymcmb word2 mere ta dit tu va resté chez toi dnc tu restes !),(Wed Apr 23 07:23:20 +0000 2016,252213169567711232,14596856,rt @chernynkaya: "have you noticed lately that word2 is getting credit for the president being in the lead except pres. obama?" ...)})
Now without dump or storing it into a file, I tried this.
B = FILTER A BY pattern = 'word1_1473325383_265214120940';
describe B;
B: {pattern: chararray,data: {(json::created_at: chararray,json::id: chararray,json::user_id: chararray,json::text: chararray)}}
STORE B into 'hbase://word1_1473325383_265214120940' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:data');
Output given as success but there is no data stored into table.When I checked the logs below is the warning. 2016-09-08 19:45:46,223 [Readahead Thread #2] WARN org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile
EBADF: Bad file descriptor
Please don't hesitate to suggest me what I am missing here. thank you.
... View more
09-12-2016
01:20 PM
1 Kudo
thanks for your reply Artem Ervits. I think it is because of the difference versions that i have used in my script. When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys. script:- REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.1.jar
REGISTER elephant-bird-pig-4.1.jar
REGISTER json-simple-1.1.1.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
extracted = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;
dump extracted;
And it worked fine.
... View more
09-07-2016
07:18 AM
I think i got it on my own. here is the script that i have written. res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime;
res1= foreach (group epoch by epochtime){data}
dump res1;
... View more
09-06-2016
01:07 PM
I think i found the answer on my own. B = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime())); I just removed A. from the inner CONCAT. And it worked fine.
... View more
09-02-2016
06:19 PM
1 Kudo
As far as using pig to insert the data to a hbase table, these links should be helpful: https://community.hortonworks.com/questions/31164/hbase-insert-from-pig.html http://princetonits.com/blog/technology/loading-customer-data-into-hbase-using-a-pig-script/
... View more
- « Previous
-
- 1
- 2
- Next »