Support Questions

Find answers, ask questions, and share your expertise

How to get the desired grouping result pig

avatar
Expert Contributor

Hi guys, I have been struggling with my Pig code and I haven't arrived at desired result so this is why I'm knocking you guys. Well, I have a file with some information and my idea is to get a counting by reference number. so as an overview I have done:

16011-pig.png

So the third step worked but the problem is that it generates a huge tuple to include the reference number by each tuple in my grouping bag that contains the number so the output it's like:

16012-pig.png

Then I tried the fourth step but although I got the counting list I missed the reference_number so I would like to get the same list but just once the reference code.

Thanks so much for your help team. @Lester Martin

1 ACCEPTED SOLUTION

avatar

Hi @Andres Urrego,

You need to modify your third step like.

october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);

View solution in original post

2 REPLIES 2

avatar

Hi @Andres Urrego,

You need to modify your third step like.

october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);

avatar
Expert Contributor

got it, I start to understand how works the grouping in Pig . actually to be sure i did:

october_gr_counting = FOREACH october_station_gr GENERATE group , COUNT(october)

thanks so much buddy.