Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume zip file HDFS ingestions

Highlighted

Flume zip file HDFS ingestions

New Contributor

Hi all,

I would like to use Flume (1.6.0) in order to poll a directory, target of an FTPS transfer, and read zip files in order to ingest the files in HDFS.

I am triyng use Spooling Directory Source but it seems it is written in order to exit with error if the file is lock.

 

The file is not written to the target directory in a unique solution, but it is appended.

Can you please confirm I can't use Spooling Directory Source for this use case?

Any suggestion?

 

Thank you

 

Regards

1 REPLY 1

Re: Flume zip file HDFS ingestions

Super Collaborator

the spool directory source can't be used on files that are being appended to. If you need to read from that directory, you would either need to ensure that:
1. Files that are currently being appended are written in a different directory and then moved to the spool dir after completion.
2. use the "ignorePattern" property to exclude files that are actively being written to. This would require that those files be renamed when they are completed.

Flume isn't generally designed to perform as a file transfer mechanism. If you need to copy files to hdfs, it would be recommended to copy them directly with a cron job, or possibly use the hdfs nfs gateway and write the files directly to hdfs.  You could use an oozie ssh action to copy the files into hdfs also.

Flume is better used for real time event streaming, not as a file transfer mechanism.

-pd