Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Filter OUT Null from all columns

New Contributor

Hi Guys,

My sample file has 27 columns with INT and Chararray data types.

My requirement is to filter out all the null values which are present in these columns

What I am trying to do is this,

Sample_fil = FILTER Adzone by (Div is not null) and (Zone_yak is not null) and (ProdGroup is not null) and (Zonename is not null) and (Store_fruit is not null) and (Comp_Zone is not null) and (Department is not null);

This script is only for the first 7 columns only. I can write the same script even for the rest of the columns but I am looking for a way to optimize the script. My sample file has 12 Integer datatype and the rest are Chararray.

Please give your suggestions.

Regards,

Pradeep.

1 ACCEPTED SOLUTION

User Defined Functions (UDF) come to the rescue. Search for "Filter Functions" in http://pig.apache.org/docs/r0.15.0/udf.html and you'll see a rough example of how to do this. Now, your "isEmpty" (or whatever you call the function) will be implemented differently. In your's, you would need to walk each element and check for null. If all of the row's (called "input" in that example UDF) fields are null then you can ultimately return a boolean value that can be used in your code (after you build the UDF).

If this is your first Pig UDF, there are plenty of examples on the internet; including mine at https://martin.atlassian.net/wiki/x/C4BRAQ. Good luck!

View solution in original post

3 REPLIES 3

User Defined Functions (UDF) come to the rescue. Search for "Filter Functions" in http://pig.apache.org/docs/r0.15.0/udf.html and you'll see a rough example of how to do this. Now, your "isEmpty" (or whatever you call the function) will be implemented differently. In your's, you would need to walk each element and check for null. If all of the row's (called "input" in that example UDF) fields are null then you can ultimately return a boolean value that can be used in your code (after you build the UDF).

If this is your first Pig UDF, there are plenty of examples on the internet; including mine at https://martin.atlassian.net/wiki/x/C4BRAQ. Good luck!

Its better to make use of UDFs in this condition. Check the below link, it has a UDF for the same,

http://stackoverflow.com/questions/12959001/how-to-filter-records-with-a-null-value-in-pig

Hope this helps.

Regards,

Arun

New Contributor

Ya I have seen this already, I was just wondering if there was a better way to do it. Thanks anyway.