Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

PIG UDFs Python - Gurantee that String have the format 'yyyy-MM-dd hh:ss:mm'

avatar
New Member

Hi experts,

I've the following part of script in Apache Pig:

....

A = foreach Source_Data generate (int) ID,

ToString( ToDate((long) Time), 'yyyy-MM-dd hh:ss:mm') as date,

(int) Code;

Store A into '.../newfile'; ...

Now I want to create a new Script using Python UDF to guarantee that in my newfile on column Date (#1) I only have String in the format 'yyyy-MM-dd hh:ss:mm'.

Is possible to do that?

Many thanks!

1 ACCEPTED SOLUTION

avatar
Master Mentor

you can write a new script using regex to test this column and throw away bad fields or do it all in one step where you pass the date field to UDF and check for formatting

View solution in original post

1 REPLY 1

avatar
Master Mentor

you can write a new script using regex to test this column and throw away bad fields or do it all in one step where you pass the date field to UDF and check for formatting