Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

ListFile monitoring a high throughput file copy operation.

ListFile monitoring a high throughput file copy operation.

New Contributor

I'm attempting to monitor a directory on Linux with Nifi's 1.4.0 ListFile processor. This directory has a Windows host on the network dumping tens of thousands of files into it. At the moment ListFile is missing files and in some cases seems to stop listing new files.

I've been trying to figure out a way to make this work and I've tried a few things but I'd be interested if anyone else has an opinion.

Firstly I ran a ListFile Processer with a Schedule time of 0 seconds and started dropping files into the folder. 50,000/40gb over about an hour. After a dozen files the ListFile processor locked. It appeared to be still doing it's checks but it did not process any more files.

I then set the Minimum File Age to 60 seconds and the Scheduling to 30 seconds for ListSFTP. This got me about 25% of the files but as time went on it got worse and more remained in source folder.

Is there a known way to make this work? I need a way to handle moving files out of a folder once their copy has completed and not miss any. I have checked the logs on the default logging level and the ListFile processor is not reporting any errors.

I also have a question about minimum file age. If I set this does ListFile come back to the file later once it has passed the minimum file age? Or if it ever fails the minimum file age test does it enter the state and is forever marked as ignored/