I've the following part of script in Apache Pig:
A = foreach Source_Data generate
ToString( ToDate((long) Time), 'yyyy-MM-dd hh:ss:mm') as date,
Store A into '.../newfile';
Now I want to create a new Script using Python UDF to guarantee that in my newfile on column Date (#1) I only have String in the format 'yyyy-MM-dd hh:ss:mm'.
Is possible to do that?
you can write a new script using regex to test this column and throw away bad fields or do it all in one step where you pass the date field to UDF and check for formatting
View solution in original post