Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pig - Filter output of cogroup having NULL

avatar
Rising Star

Hi,

This is my schema after cogroup:

C: {group: chararray,A: {(Name: chararray,Team: chararray,Positions: {T: (position: chararray)},Role: map[])},A1: {(Name1: chararray,Team1: chararray
,Points: int)}}

And I would wanted to filter out C whose A1 is empty. Like the below record:

(Jake Fox,{(Jake Fox,Chicago Cubs,{(Infielder),(Catcher),(Outfielder),(First_baseman)},[hit_by_pitch#5,games#89,on_base_percentage#0.305,grand_slams#
1,home_runs#11,sacrifice_flies#6,at_bats#230,gdb#6,ibbs#1,base_on_balls#15,hits#58,rbis#45,slugging_percentage#0.457,batting_average#0.252,doubles#14
,runs#26,strikeouts#49])},{}) 

I tried nested foreach but it did not help. The output was empty bag:

Could someone post the query. Many Thanks!!!

1 ACCEPTED SOLUTION

avatar

Hi @Revathy Mourouguessane,

You can use IsEmpty to check if A1 is empty or not. Try something like this

grouped = COGROUP ..... ;
filtered = FILTER grouped BY not IsEmpty($2);
DUMP filtered;

Here's an example that shows how this work for something similar:

cat > owners.csv
adam,cat
adam,dog
alex,fish
david,horse
alice,cat
steve,dog

cat > pets.csv
nemo,fish
fido,dog
rex,dog
paws,cat
wiskers,cat

owners = LOAD 'owners.csv' USING PigStorage(',') AS (owner:chararray,animal:chararray);
pets = LOAD 'pets.csv' USING PigStorage(',') AS (name:chararray,animal:chararray);
grouped = COGROUP owners BY animal, pets by animal;
filtered = FILTER grouped BY not IsEmpty($2);

DUMP grouped;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(horse,{(david,horse)},{})
(fish,{(alex,fish)},{(nemo,fish)})

DUMP filtered;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(fish,{(alex,fish)},{(nemo,fish)})

View solution in original post

3 REPLIES 3

avatar

Hi @Revathy Mourouguessane,

You can use IsEmpty to check if A1 is empty or not. Try something like this

grouped = COGROUP ..... ;
filtered = FILTER grouped BY not IsEmpty($2);
DUMP filtered;

Here's an example that shows how this work for something similar:

cat > owners.csv
adam,cat
adam,dog
alex,fish
david,horse
alice,cat
steve,dog

cat > pets.csv
nemo,fish
fido,dog
rex,dog
paws,cat
wiskers,cat

owners = LOAD 'owners.csv' USING PigStorage(',') AS (owner:chararray,animal:chararray);
pets = LOAD 'pets.csv' USING PigStorage(',') AS (name:chararray,animal:chararray);
grouped = COGROUP owners BY animal, pets by animal;
filtered = FILTER grouped BY not IsEmpty($2);

DUMP grouped;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(horse,{(david,horse)},{})
(fish,{(alex,fish)},{(nemo,fish)})

DUMP filtered;
(cat,{(alice,cat),(adam,cat)},{(wiskers,cat),(paws,cat)})
(dog,{(steve,dog),(adam,dog)},{(rex,dog),(fido,dog)})
(fish,{(alex,fish)},{(nemo,fish)})

avatar

Hi @Revathy Mourouguessane, have you tried this solution ?

avatar
Rising Star

Hi Abdel, I haven't tried this one. Used Join instead. I would try. Thank you.