Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to get the desired grouping result pig

avatar
Expert Contributor

Hi guys, I have been struggling with my Pig code and I haven't arrived at desired result so this is why I'm knocking you guys. Well, I have a file with some information and my idea is to get a counting by reference number. so as an overview I have done:

16011-pig.png

So the third step worked but the problem is that it generates a huge tuple to include the reference number by each tuple in my grouping bag that contains the number so the output it's like:

16012-pig.png

Then I tried the fourth step but although I got the counting list I missed the reference_number so I would like to get the same list but just once the reference code.

Thanks so much for your help team. @Lester Martin

1 ACCEPTED SOLUTION

avatar
New Member

Hi @Andres Urrego,

You need to modify your third step like.

october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);

View solution in original post

2 REPLIES 2

avatar
New Member

Hi @Andres Urrego,

You need to modify your third step like.

october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);

avatar
Expert Contributor

got it, I start to understand how works the grouping in Pig . actually to be sure i did:

october_gr_counting = FOREACH october_station_gr GENERATE group , COUNT(october)

thanks so much buddy.