Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Extract timestamp from filename and add it in new column(say,date) by using Pig

avatar
Explorer

I have a file with name YYYYMMDD_claims_portal.csv, i need only YYYYMMDD part and store this value in new column(say,date). Earlier we have 3 column like Claim,User,ID. now i need to add one more column date having value as YYYYMMDD as per file. Please help, its bit urgent.

Thanks in advance for any help you guys can provide.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Sumee singh

Please try this:

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile');

y = FOREACH A GENERATE SUBSTRING($0,0,8),$1..;

DUMP y;

(Input file name comes as the first field in tuple). You can modify after this as you wish.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

@Sumee singh

Please try this:

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile');

y = FOREACH A GENERATE SUBSTRING($0,0,8),$1..;

DUMP y;

(Input file name comes as the first field in tuple). You can modify after this as you wish.

avatar
Explorer

@tsharma

Thanks for your prompt reply..i'll try this approach but by-tagFile we tagged file name with all the column name, here what i want is to create a new column like date and store the file name in it..

Thank you.

avatar
Super Collaborator

Ok, do this:-

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile') AS (filename:chararray, {other columns as per your schema})

y = FOREACH A GENERATE $1..,SUBSTRING(filename,0,8) AS day;

describe y;

DUMP y;

avatar
Explorer

Thanks @tsharma.. This works.. Thank you 🙂