Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

pig - Filter output of cogroup having NULL

avatar
Rising Star

Hi,

This is my schema after cogroup:

C: {group: chararray,A: {(Name: chararray,Team: chararray,Positions: {T: (position: chararray)},Role: map[])},A1: {(Name1: chararray,Team1: chararray
,Points: int)}}

And I would wanted to filter out C whose A1 is empty. Like the below record:

(Jake Fox,{(Jake Fox,Chicago Cubs,{(Infielder),(Catcher),(Outfielder),(First_baseman)},[hit_by_pitch#5,games#89,on_base_percentage#0.305,grand_slams#
1,home_runs#11,sacrifice_flies#6,at_bats#230,gdb#6,ibbs#1,base_on_balls#15,hits#58,rbis#45,slugging_percentage#0.457,batting_average#0.252,doubles#14
,runs#26,strikeouts#49])},{}) 

I tried nested foreach but it did not help. The output was empty bag:

Could someone post the query. Many Thanks!!!

1 ACCEPTED SOLUTION

avatar

Hi @Revathy Mourouguessane,

You can use IsEmpty to check if A1 is empty or not. Try something like this

grouped = COGROUP ..... ;
filtered = FILTER grouped BY not IsEmpty($2);
DUMP filtered;

Here's an example that shows how this work for something similar:

cat > owners.csv
adam,cat
adam,dog
alex,fish
david,horse
alice,cat
steve,dog

cat > pets.csv
nemo,fish
fido,dog
rex,dog
paws,cat
wiskers,cat

owners = LOAD 'owners.csv' USING PigStorage(',') AS (owner:chararray,animal:chararray);
pets = LOAD 'pets.csv' USING PigStorage(',') AS (name:chararray,animal:chararray);
grouped = COGROUP owners BY animal, pets by animal;
filtered = FILTER grouped BY not IsEmpty($2);

DUMP grouped;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(horse,{(david,horse)},{})
(fish,{(alex,fish)},{(nemo,fish)})

DUMP filtered;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(fish,{(alex,fish)},{(nemo,fish)})

View solution in original post

3 REPLIES 3

avatar

Hi @Revathy Mourouguessane,

You can use IsEmpty to check if A1 is empty or not. Try something like this

grouped = COGROUP ..... ;
filtered = FILTER grouped BY not IsEmpty($2);
DUMP filtered;

Here's an example that shows how this work for something similar:

cat > owners.csv
adam,cat
adam,dog
alex,fish
david,horse
alice,cat
steve,dog

cat > pets.csv
nemo,fish
fido,dog
rex,dog
paws,cat
wiskers,cat

owners = LOAD 'owners.csv' USING PigStorage(',') AS (owner:chararray,animal:chararray);
pets = LOAD 'pets.csv' USING PigStorage(',') AS (name:chararray,animal:chararray);
grouped = COGROUP owners BY animal, pets by animal;
filtered = FILTER grouped BY not IsEmpty($2);

DUMP grouped;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(horse,{(david,horse)},{})
(fish,{(alex,fish)},{(nemo,fish)})

DUMP filtered;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(fish,{(alex,fish)},{(nemo,fish)})

avatar

Hi @Revathy Mourouguessane, have you tried this solution ?

avatar
Rising Star

Hi Abdel, I haven't tried this one. Used Join instead. I would try. Thank you.