About mohan221213

mohan221213 · ‎09-12-2016

thank you gkeys... You are....the best...

gkeys · ‎09-11-2016

@Mohan V Very glad to see you solved it yourself by debugging -- it is the best way to learn and improve your skills 🙂

mohan221213 · ‎09-09-2016

I think i got it on my own. Actually I have forgotten the credentials and entered the wrong password. But at last its done by entering right credentials.

gkeys · ‎09-08-2016

@Mohan V I would: Land the data in a landing zone in hdfs. Decide to keep this going forward or not (you may want to reuse the raw data). Then use pig scripts to transform the data into your hbase tables as tab-delimited output (see next step). Importantly, this involves inserting a key as the first column of your resulting tsv file. HBase of course is all about well-designed keys. You will use pig's CONCAT() function to create a key from existing fields. It is often useful to concatenate fields into a key with a "-" separating each field in the resulting composite key. A single tsv output will be used to bulk load a single hbase table (next step). These should be outputted to a tmp dir in hdfs to be used as input in the next step. Note: you could take your pig scripting to the next level and create a single flexible pig script for creating tsv output for all hbase tables. See https://community.hortonworks.com/content/kbentry/51884/pig-doing-yoga-how-to-build-superflexible-pig-scri.html . Not necessary though. 3. Then do a bulk import into your hbase table for each tsv. See the following links on bulk imports. (Inserting record by record will be much too slow for large tables. http://hbase.apache.org/0.94/book/arch.bulk.load.html http://hbase.apache.org/book.html#importtsv I have used this workflow frequently, including loading 2.53 billion relational records into a HBase table. The more you do it, the more automated you find yourself making it.

mohan221213 · ‎09-08-2016

@Artem Ervits thanks for your valuable explanation. By using that i have tried it in another way. I.e without storing the output to a text file and again loading back by using pigstorage, before itself i have tried to filter based on word and tried to store it in hbase. Above I have mentioned only the scenario what i need.but here is the actual script and data that i have used. Output & Script: A = foreach (group epoch BY epochtime) { data = foreach epoch generate created_at,id,user_id,text; generate group as pattern, data; } By using this I got the below output (word1_1473344765_265217609700,{(Wed Apr 20 07:23:20 +0000 2016,252479809098223616,450990391,rt @joey7barton: ..give a word1 about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ...),(Wed Apr 22 07:23:20 +0000 2016,252455630361747457,118179886,@dawnriseth word1 and then we will have to prove it again by reelecting obama in 2016, 2020... this race-baiting never ends.)}) (word2_1473344765_265217609700,{(Wed Apr 21 07:23:20 +0000 2016,252370526411051008,845912316,@maarionymcmb word2 mere ta dit tu va resté chez toi dnc tu restes !),(Wed Apr 23 07:23:20 +0000 2016,252213169567711232,14596856,rt @chernynkaya: "have you noticed lately that word2 is getting credit for the president being in the lead except pres. obama?" ...)}) Now without dump or storing it into a file, I tried this. B = FILTER A BY pattern = 'word1_1473325383_265214120940'; describe B; B: {pattern: chararray,data: {(json::created_at: chararray,json::id: chararray,json::user_id: chararray,json::text: chararray)}} STORE B into 'hbase://word1_1473325383_265214120940' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:data'); Output given as success but there is no data stored into table.When I checked the logs below is the warning. 2016-09-08 19:45:46,223 [Readahead Thread #2] WARN org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile EBADF: Bad file descriptor Please don't hesitate to suggest me what I am missing here. thank you.

mohan221213 · ‎09-12-2016

thanks for your reply Artem Ervits. I think it is because of the difference versions that i have used in my script. When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys. script:- REGISTER elephant-bird-core-4.1.jar REGISTER elephant-bird-hadoop-compat-4.1.jar REGISTER elephant-bird-pig-4.1.jar REGISTER json-simple-1.1.1.jar twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader(); extracted = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place; dump extracted; And it worked fine.

mohan221213 · ‎09-07-2016

I think i got it on my own. here is the script that i have written. res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*')); epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime; res1= foreach (group epoch by epochtime){data} dump res1;

mohan221213 · ‎09-06-2016

I think i found the answer on my own. B = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime())); I just removed A. from the inner CONCAT. And it worked fine.

gkeys · ‎09-02-2016

As far as using pig to insert the data to a hbase table, these links should be helpful: https://community.hortonworks.com/questions/31164/hbase-insert-from-pig.html http://princetonits.com/blog/technology/loading-customer-data-into-hbase-using-a-pig-script/

matt_andruff · ‎09-02-2016

That Looks better than what I proposed. Thumbs up!

Online	Offline
Last Visited	‎03-15-2019 09:32 AM

Member Since	‎06-03-2016 01:08 PM
Last Visited	‎03-15-2019 09:32 AM
Posts	66
Kudos received	21

Cloudera Community

Re: Ambari server start giving Error java process ...

Re: PIg JsonLoader (UDF_WARNING_1): Bad map field,...

Re: Unable to read json file using elephant-bird,p...

Re: PIG script Error

Re: Disable Kerberos From Ambari Completely

Re: ERROR 1066: Unable to open iterator for alias-...

Re: PIG script Error

Re: Disable Kerberos From Ambari Completely

Re: How to perform Denormalization in Hbase ?

Re: STORE Pig OUTPUT into MULTIPLE HBase TABLES

Re: Unable to read json file using elephant-bird,p...

Re: PIG: CONCAT A relation OUTPUT to another RELAT...

Re: PIG: Unable to open iterator for alias AliasNa...

Re: Can we create HBase table using PIG if Yes the...

Re: How to get EPOCH time in PIG?