Hi guys, I have been struggling with my Pig code and I haven't arrived at desired result so this is why I'm knocking you guys. Well, I have a file with some information and my idea is to get a counting by reference number. so as an overview I have done:
So the third step worked but the problem is that it generates a huge tuple to include the reference number by each tuple in my grouping bag that contains the number so the output it's like:
Then I tried the fourth step but although I got the counting list I missed the reference_number so I would like to get the same list but just once the reference code.
Thanks so much for your help team. @Lester Martin
got it, I start to understand how works the grouping in Pig . actually to be sure i did:
october_gr_counting = FOREACH october_station_gr GENERATE group , COUNT(october)
thanks so much buddy.