Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

PIG: CONCAT A relation OUTPUT to another RELATION

avatar
Explorer

Sorry for the wrong phrasing of question. I am new to clouderaas well as I am completely new to PIG and trying to experiment on my own.

 

I have a scenario where to process the words.t file and data.txt file.

 

words.txt

word1
word2
word3
word4

data.txt

{"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

I need to get the output as

(word1_epochtime){complete data which matched in text attribute}

i.e

(word1_1234567890){{"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

I have got the ouput as

(word1){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

by using this script.

load words.txt
load data.txt
c = cross words,data;
d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
e =  foreach (group d BY word) {data);

and I got the epochtime with the words as

time = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at)));

But I am unable to CONCAT the words with time.

How can i get the output as

(word1_epochtime){data}

Please feel free to suggest me for the above.

 

Thank you.

1 ACCEPTED SOLUTION

avatar
Explorer

I think i got it on my own.

 

here is the script that i have written

 

res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime;
res1= foreach (group epoch by epochtime){data}
dump res1;

 

 

View solution in original post

1 REPLY 1

avatar
Explorer

I think i got it on my own.

 

here is the script that i have written

 

res = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
epoch = FOREACH res GENERATE CONCAT(CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime;
res1= foreach (group epoch by epochtime){data}
dump res1;