About mohan221213

mohan221213 · ‎09-08-2016

@Artem Ervits thanks for your valuable explanation. By using that i have tried it in another way. I.e without storing the output to a text file and again loading back by using pigstorage, before itself i have tried to filter based on word and tried to store it in hbase. Above I have mentioned only the scenario what i need.but here is the actual script and data that i have used. Output & Script: A = foreach (group epoch BY epochtime) { data = foreach epoch generate created_at,id,user_id,text; generate group as pattern, data; } By using this I got the below output (word1_1473344765_265217609700,{(Wed Apr 20 07:23:20 +0000 2016,252479809098223616,450990391,rt @joey7barton: ..give a word1 about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ...),(Wed Apr 22 07:23:20 +0000 2016,252455630361747457,118179886,@dawnriseth word1 and then we will have to prove it again by reelecting obama in 2016, 2020... this race-baiting never ends.)}) (word2_1473344765_265217609700,{(Wed Apr 21 07:23:20 +0000 2016,252370526411051008,845912316,@maarionymcmb word2 mere ta dit tu va resté chez toi dnc tu restes !),(Wed Apr 23 07:23:20 +0000 2016,252213169567711232,14596856,rt @chernynkaya: "have you noticed lately that word2 is getting credit for the president being in the lead except pres. obama?" ...)}) Now without dump or storing it into a file, I tried this. B = FILTER A BY pattern = 'word1_1473325383_265214120940'; describe B; B: {pattern: chararray,data: {(json::created_at: chararray,json::id: chararray,json::user_id: chararray,json::text: chararray)}} STORE B into 'hbase://word1_1473325383_265214120940' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:data'); Output given as success but there is no data stored into table.When I checked the logs below is the warning. 2016-09-08 19:45:46,223 [Readahead Thread #2] WARN org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile EBADF: Bad file descriptor Please don't hesitate to suggest me what I am missing here. thank you.

mohan221213 · ‎09-08-2016

Hi All, We are trying to migrate our existing RDBMS(Sql Database) system to hadoop. We are planning to use hbase for the same. But we are not getting how to denormalize sql data to store it in hbase column format. Is it possible? If yes then what would be the best approach for that? Which hbase version is required for this? Any suggestions.

mohan221213 · ‎09-07-2016

thanks for your reply Artem Ervits. can you please give me an example for that.It will be so helpfull for me.

mohan221213 · ‎09-07-2016

Hi All, How can we store the output of pig into multiple hbase tables. Hbase tables are already created, need to store the each specific value into specific table. For EX: I have got the output as (word1){data} (word2){data} (word3){data} (word4){data} So I need to store output into already created tables. Table Names are like word1 word2 word3 word4 Now output should be store in already created tables as word1 ----> (word1){data} word2 ----> (word2){data} word3 ----> (word3){data} Any suggestions. thank you.

mohan221213 · ‎09-07-2016

Trying to load the json file which is having null values in it by using elephant-bird JsonLoader. sample.json {"created_at":"Mon Aug 22 10:48:23 +0000 2016","id":767674772662607873,"id_str":"767674772662607873","text":"KPIT Image Result for https:\/\/t.co\/Nas2ZnF1zZ... https:\/\/t.co\/9TnelwtIvm","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":123,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/Nas2ZnF1zZ","expanded_url":"http:\/\/miltonious.com\/","display_url":"miltonious.com","indices":[24,47]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1471862903167"} script: REGISTER piggybank.jar REGISTER json-simple-1.1.1.jar REGISTER elephant-bird-pig-4.3.jar REGISTER elephant-bird-core-4.1.jar REGISTER elephant-bird-hadoop-compat-4.3.jar json = LOAD 'sample.json' USING JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray'); describe json; dump json; When I dump json,I am getting the following output and the worning (Mon Aug 22 10:48:23 +0000 2016,767674772662607873,767674772662607873,google Image Result for Twitter Web Client,false,1234,12345,3214,43215,,,,,,,,,,,,,,) WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {complete json} By warning i guess it is getting NULL values. So how can we load a Json which is having null values in it. And I have tried in another way i.e json = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray'); describe json; Output Schema for json unknown. Please suggest me. Thanks.

mohan221213 · ‎09-07-2016

I think i got it on my own. here is the script that i have written. res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*')); epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime; res1= foreach (group epoch by epochtime){data} dump res1;

mohan221213 · ‎09-06-2016

Hi all, Sorry for the wrong phrasing of question. I have a scenario where to process the words.t file and data.txt file. words.txt word1 word2 word3 word4 data.txt {"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616} I need to get the output as (word1_epochtime){complete data which matched in text attribute} i.e (word1_1234567890){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616} I have got the ouput as (word1){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616} by using this script. 1.load words.txt 2.load data.txt c = cross words,data; d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*')); e = foreach (group d BY word) {data); and I got the epochtime with the words as <code>time = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))); But I am unable to CONCAT the words with time. How can i get the output as (word1_epochtime){data} Please feel free to suggest me for the above. Mohan.V

mohan221213 · ‎09-06-2016

I think i found the answer on my own. B = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime())); I just removed A. from the inner CONCAT. And it worked fine.

mohan221213 · ‎09-06-2016

Hi all, I am new to pig and trying to learn on my own. I have written a script to get the epoch time with a word that is reading from words.txt file. Here is the script. words = LOAD 'words.txt' AS word:chararray; B = FOREACH A GENERATE CONCAT(CONCAT(A.word,'_'),(chararray)ToUnixTime(CurrentTime()); dump B; But the issue is, if words.txt file have only one word it is giving proper output. If it is having multiple words like word1 word2 word3 word4 then it is giving the following error ERROR 1066: Unable to open iterator for alias B java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (word1 ), 2nd :(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" ) at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (word1 ), 2nd :(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" ) at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:122) at o Please suggest me to solve this issue. Thank you. Mohan.V

mohan221213 · ‎09-02-2016

I think i got it too. please correct me if im wrong A = LOAD 'words.txt' AS (word:chararray); B = FOREACH A GENERATE CONCAT(CONCAT(A.word,'_'),(chararray)ToUnixTime(CurrentTime()); dump B;

Online	Offline
Last Visited	‎03-15-2019 09:32 AM

Member Since	‎06-03-2016 01:08 PM
Last Visited	‎03-15-2019 09:32 AM
Posts	66
Kudos received	21

Cloudera Community

Re: Ambari server start giving Error java process ...

Re: PIg JsonLoader (UDF_WARNING_1): Bad map field,...

Re: Unable to read json file using elephant-bird,p...

Re: PIG script Error

Re: Disable Kerberos From Ambari Completely

Re: STORE Pig OUTPUT into MULTIPLE HBase TABLES

How to perform Denormalization in Hbase ?

Re: STORE Pig OUTPUT into MULTIPLE HBase TABLES

STORE Pig OUTPUT into MULTIPLE HBase TABLES

Unable to read json file using elephant-bird,pleas...

Re: PIG: CONCAT A relation OUTPUT to another RELAT...

PIG: CONCAT A relation OUTPUT to another RELATION

Re: PIG: Unable to open iterator for alias AliasNa...

PIG: Unable to open iterator for alias AliasName.S...

Re: How to get EPOCH time in PIG?