Support Questions

Find answers, ask questions, and share your expertise

Delete files in Apache NiFi

avatar
New Contributor

I am using Apache NiFi to process a huge amount of CSV files. Within the process, I identify if this file is valid or not - in both ways I want to delete the file. Either if it is not needed or after the finished processing. For this I use the ExecuteStreamCommand processor with the following configuration:

Command Arguments|-f;${absolute.path}${filename}
Command Path|/bin/rm
Ignore STDIN|true
Working directory|not set
Argument delimiter|;
Output destination attribute|not set
Max attribute length|256

The process indeed works and delete files wherever this processor is integrated. But in the real system with 1500 files per hour only approx 30% of the files get deleted. This leads to a full file share and the system stops working because no further data arrives. The odd thing - I don't get any exception in the logs. Does anybody know why this is not working properly?

1 REPLY 1

avatar
New Contributor

We reproduced the problem - we don't have an issue with deleting files we have an issue in listing all files.
We are using a bunch of folders with a bunch of listfile processors. Some files are not covered. When we reset the state of each listfile processor all files in the shares (untouched) are identified by the listfile processors. But after a while they again miss new files on the share.

As a workaround, we scripted the cleaning of the state of all processors every five minutes. And as long as the pipeline is fast enough it does not create backpressure on thousand of files.

We think that we give Entity tracking and redis cache integration for ListFile processors a try next to remove this hack again.