Support Questions

Find answers, ask questions, and share your expertise

Extract timestamp from filename and add it in new column(say,date) by using Pig

avatar
Explorer

I have a file with name YYYYMMDD_claims_portal.csv, i need only YYYYMMDD part and store this value in new column(say,date). Earlier we have 3 column like Claim,User,ID. now i need to add one more column date having value as YYYYMMDD as per file. Please help, its bit urgent.

Thanks in advance for any help you guys can provide.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Sumee singh

Please try this:

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile');

y = FOREACH A GENERATE SUBSTRING($0,0,8),$1..;

DUMP y;

(Input file name comes as the first field in tuple). You can modify after this as you wish.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

@Sumee singh

Please try this:

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile');

y = FOREACH A GENERATE SUBSTRING($0,0,8),$1..;

DUMP y;

(Input file name comes as the first field in tuple). You can modify after this as you wish.

avatar
Explorer

@tsharma

Thanks for your prompt reply..i'll try this approach but by-tagFile we tagged file name with all the column name, here what i want is to create a new column like date and store the file name in it..

Thank you.

avatar
Super Collaborator

Ok, do this:-

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile') AS (filename:chararray, {other columns as per your schema})

y = FOREACH A GENERATE $1..,SUBSTRING(filename,0,8) AS day;

describe y;

DUMP y;

avatar
Explorer

Thanks @tsharma.. This works.. Thank you 🙂