- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to get the desired grouping result pig
- Labels:
-
Apache Hadoop
-
Apache Pig
Created on 06-05-2017 01:11 PM - edited 08-17-2019 11:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys, I have been struggling with my Pig code and I haven't arrived at desired result so this is why I'm knocking you guys. Well, I have a file with some information and my idea is to get a counting by reference number. so as an overview I have done:
So the third step worked but the problem is that it generates a huge tuple to include the reference number by each tuple in my grouping bag that contains the number so the output it's like:
Then I tried the fourth step but although I got the counting list I missed the reference_number so I would like to get the same list but just once the reference code.
Thanks so much for your help team. @Lester Martin
Created 06-06-2017 07:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Andres Urrego,
You need to modify your third step like.
october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);
Created 06-06-2017 07:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Andres Urrego,
You need to modify your third step like.
october_total_station = FOREACH october_station_gr GENERATE FLATTEN(group) , COUNT(october.s_station);
Created 06-06-2017 08:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
got it, I start to understand how works the grouping in Pig . actually to be sure i did:
october_gr_counting = FOREACH october_station_gr GENERATE group , COUNT(october)
thanks so much buddy.