Hi, what are the recommended approaches for handling the following scenario? NiFi is ingesting lots of files (say, pull from a remote system into the flow), and we care about file as a whole only, so flowfile content is the file, no further splits or row-by-row processing. The size of files can vary from few MBs to GBs, which is not the problem, but what happens when there are millions of files ingested this way? Say, they end up in HDFS in the dataflow.
Given that file content will be recorded in the content repository to enable data provenance, disk space may become an issue. Any way to control this purge/expiration on a more fine-grained level other than instance-wide journal settings?