Created on 09-06-2016 12:32 PM - edited 09-16-2022 03:37 AM
Sorry for the wrong phrasing of question. I am new to clouderaas well as I am completely new to PIG and trying to experiment on my own.
I have a scenario where to process the words.t file and data.txt file.
words.txt
word1 word2 word3 word4
data.txt
{"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}
I need to get the output as
(word1_epochtime){complete data which matched in text attribute}
i.e
(word1_1234567890){{"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}
I have got the ouput as
(word1){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}
by using this script.
load words.txt load data.txt c = cross words,data; d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*')); e = foreach (group d BY word) {data);
and I got the epochtime with the words as
time = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at)));
But I am unable to CONCAT the words with time.
How can i get the output as
(word1_epochtime){data}
Please feel free to suggest me for the above.
Thank you.
Created 09-07-2016 05:15 AM
I think i got it on my own.
here is the script that i have written
res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*')); epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime; res1= foreach (group epoch by epochtime){data} dump res1;
Created 09-07-2016 05:15 AM
I think i got it on my own.
here is the script that i have written
res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*')); epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime; res1= foreach (group epoch by epochtime){data} dump res1;