Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Extract timestamp from filename and add it in new column(say,date) by using Pig

avatar
New Contributor

I have a file with name YYYYMMDD_claims_portal.csv, i need only YYYYMMDD part and store this value in new column(say,date). Earlier we have 3 column like Claim,User,ID. now i need to add one more column date having value as YYYYMMDD as per file. Please help, its bit urgent.

Thanks in advance for any help you guys can provide.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Sumee singh

Please try this:

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile');

y = FOREACH A GENERATE SUBSTRING($0,0,8),$1..;

DUMP y;

(Input file name comes as the first field in tuple). You can modify after this as you wish.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

@Sumee singh

Please try this:

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile');

y = FOREACH A GENERATE SUBSTRING($0,0,8),$1..;

DUMP y;

(Input file name comes as the first field in tuple). You can modify after this as you wish.

avatar
New Contributor

@tsharma

Thanks for your prompt reply..i'll try this approach but by-tagFile we tagged file name with all the column name, here what i want is to create a new column like date and store the file name in it..

Thank you.

avatar
Expert Contributor

Ok, do this:-

A = LOAD 'YYYYMMDD_claims_portal.csv' using PigStorage(',','-tagFile') AS (filename:chararray, {other columns as per your schema})

y = FOREACH A GENERATE $1..,SUBSTRING(filename,0,8) AS day;

describe y;

DUMP y;

avatar
New Contributor

Thanks @tsharma.. This works.. Thank you 🙂

Labels