Created on 06-05-2017 01:11 PM - edited 08-17-2019 11:13 PM
Hi guys, I have been struggling with my Pig code and I haven't arrived at desired result so this is why I'm knocking you guys. Well, I have a file with some information and my idea is to get a counting by reference number. so as an overview I have done:
So the third step worked but the problem is that it generates a huge tuple to include the reference number by each tuple in my grouping bag that contains the number so the output it's like:
Then I tried the fourth step but although I got the counting list I missed the reference_number so I would like to get the same list but just once the reference code.
Thanks so much for your help team. @Lester Martin
Created 06-06-2017 07:23 AM
Hi @Andres Urrego,
You need to modify your third step like.
october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);
Created 06-06-2017 07:23 AM
Hi @Andres Urrego,
You need to modify your third step like.
october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);
Created 06-06-2017 08:00 PM
got it, I start to understand how works the grouping in Pig . actually to be sure i did:
october_gr_counting = FOREACH october_station_gr GENERATE group , COUNT(october)
thanks so much buddy.