Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume: how to create (HDFS) target dir from ingested filename?

avatar
Guru

Hi,

I'm curious if it is possible to solve this problem with Flume:

I have a SpoolingDir source where files with names in the format "prefixA.prefixB.importantPart.csv" will be moved to

The files shall be put into HDFS (with its original filename) into the corresponding directory "hdfs://basepath/importantPart/", so that the absolute path for a file is "hdfs://basepath/importantPart/prefixA.prefixB.importantPart.csv".

a) how can I parse the filename to extract "importantPart" to create the output HDFS path accordingly, or is this possible at all with Flume?

b) how to preserve the original filename so that the HDFS sink writes to the file with the same filename, again, possible at all?

 

Yes, I know, Flume isn't the right tool for such "file copy" approaches it's working on events, but nevertheless it is interesting if it is possible or if someone did this already.

 

Any hint highly appreciated....

1 ACCEPTED SOLUTION

avatar
Mentor
You could do (a) with the SpoolingDirectory source, as it allows for the event to carry the original filename (via a custom sink wrapper that looks for it) but doing (b) doesn't fit in with the event delivery mechanism of Flume and AFAICT, its not possible to do directly.

View solution in original post

2 REPLIES 2

avatar
Mentor
You could do (a) with the SpoolingDirectory source, as it allows for the event to carry the original filename (via a custom sink wrapper that looks for it) but doing (b) doesn't fit in with the event delivery mechanism of Flume and AFAICT, its not possible to do directly.

avatar
Guru
Hi,
many thanks for your explanation. I'll check out the custom sink wrapper stuff...