Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PIG UDFs Python - Gurantee that String have the format 'yyyy-MM-dd hh:ss:mm'

avatar
Contributor

Hi experts,

I've the following part of script in Apache Pig:

....

A = foreach Source_Data generate (int) ID,

ToString( ToDate((long) Time), 'yyyy-MM-dd hh:ss:mm') as date,

(int) Code;

Store A into '.../newfile'; ...

Now I want to create a new Script using Python UDF to guarantee that in my newfile on column Date (#1) I only have String in the format 'yyyy-MM-dd hh:ss:mm'.

Is possible to do that?

Many thanks!

1 ACCEPTED SOLUTION

avatar
Master Mentor

you can write a new script using regex to test this column and throw away bad fields or do it all in one step where you pass the date field to UDF and check for formatting

View solution in original post

1 REPLY 1

avatar
Master Mentor

you can write a new script using regex to test this column and throw away bad fields or do it all in one step where you pass the date field to UDF and check for formatting