New Contributor
Posts: 3
Registered: ‎05-25-2018

Flume zip file HDFS ingestions

[ Edited ]

Hi all,

I would like to use Flume (1.6.0) in order to poll a directory, target of an FTPS transfer, and read zip files in order to ingest the files in HDFS.

I am triyng use Spooling Directory Source but it seems it is written in order to exit with error if the file is lock.


The file is not written to the target directory in a unique solution, but it is appended.

Can you please confirm I can't use Spooling Directory Source for this use case?

Any suggestion?


Thank you



Cloudera Employee
Posts: 277
Registered: ‎01-09-2014

Re: Flume zip file HDFS ingestions

[ Edited ]

the spool directory source can't be used on files that are being appended to. If you need to read from that directory, you would either need to ensure that:
1. Files that are currently being appended are written in a different directory and then moved to the spool dir after completion.
2. use the "ignorePattern" property to exclude files that are actively being written to. This would require that those files be renamed when they are completed.

Flume isn't generally designed to perform as a file transfer mechanism. If you need to copy files to hdfs, it would be recommended to copy them directly with a cron job, or possibly use the hdfs nfs gateway and write the files directly to hdfs.  You could use an oozie ssh action to copy the files into hdfs also.

Flume is better used for real time event streaming, not as a file transfer mechanism.