Support Questions

Find answers, ask questions, and share your expertise

Apache PIG - Guarantee that all the value in a column are Numeric Values

Explorer

Hi experts, I've a dataset with 4 columns and want to know if the column B only have numbers, if the job detect some non numeric value I want to put that value into null. Could I do this in PIG or must be Python embed pig? Many thanks!

1 ACCEPTED SOLUTION

Super Collaborator

You can use org.apache.pig.piggybank.evaluation.IsNumeric

Some thing like

X = foreach Y generate ((org.apache.pig.piggybank.evaluation.IsNumeric($1)==true)?(int)$1:null)
I have applied generate for one column, you can add rest of columns

View solution in original post

3 REPLIES 3

Super Collaborator

You can use org.apache.pig.piggybank.evaluation.IsNumeric

Some thing like

X = foreach Y generate ((org.apache.pig.piggybank.evaluation.IsNumeric($1)==true)?(int)$1:null)
I have applied generate for one column, you can add rest of columns

Explorer

Brilliant 🙂 Only one more question: How can I add a Case Statement (or a If) to my X var;

Super Collaborator

You can apply a filter or bincond operator on any column(s) of your relation X. You can get more details on available operators here