Created 11-10-2016 06:10 AM
Hi Guys,
My sample file has 27 columns with INT and Chararray data types.
My requirement is to filter out all the null values which are present in these columns
What I am trying to do is this,
Sample_fil = FILTER Adzone by (Div is not null) and (Zone_yak is not null) and (ProdGroup is not null) and (Zonename is not null) and (Store_fruit is not null) and (Comp_Zone is not null) and (Department is not null);
This script is only for the first 7 columns only. I can write the same script even for the rest of the columns but I am looking for a way to optimize the script. My sample file has 12 Integer datatype and the rest are Chararray.
Please give your suggestions.
Regards,
Pradeep.
Created 11-10-2016 01:17 PM
User Defined Functions (UDF) come to the rescue. Search for "Filter Functions" in http://pig.apache.org/docs/r0.15.0/udf.html and you'll see a rough example of how to do this. Now, your "isEmpty" (or whatever you call the function) will be implemented differently. In your's, you would need to walk each element and check for null. If all of the row's (called "input" in that example UDF) fields are null then you can ultimately return a boolean value that can be used in your code (after you build the UDF).
If this is your first Pig UDF, there are plenty of examples on the internet; including mine at https://martin.atlassian.net/wiki/x/C4BRAQ. Good luck!
Created 11-10-2016 01:17 PM
User Defined Functions (UDF) come to the rescue. Search for "Filter Functions" in http://pig.apache.org/docs/r0.15.0/udf.html and you'll see a rough example of how to do this. Now, your "isEmpty" (or whatever you call the function) will be implemented differently. In your's, you would need to walk each element and check for null. If all of the row's (called "input" in that example UDF) fields are null then you can ultimately return a boolean value that can be used in your code (after you build the UDF).
If this is your first Pig UDF, there are plenty of examples on the internet; including mine at https://martin.atlassian.net/wiki/x/C4BRAQ. Good luck!
Created 11-16-2016 06:42 AM
Its better to make use of UDFs in this condition. Check the below link, it has a UDF for the same,
http://stackoverflow.com/questions/12959001/how-to-filter-records-with-a-null-value-in-pig
Hope this helps.
Regards,
Arun
Created 11-17-2016 12:24 PM
Ya I have seen this already, I was just wondering if there was a better way to do it. Thanks anyway.