We're using Apache NiFi 1.0.1, I know we're way behind in upgrading.
Our use case is to get files from a local NiFi server mount and write to HDFS; we're using ListFile and FetchFile to achieve this. Some files are huge, so the concern is that NiFi might start to fetch files before they're completely written to the mount, which would cause partial file loads in HDFS. So the solution proposed is, the source system would send us an indicator file (located on a different directory) with a specific name; once we get that file, then we should start fetching the files with FetchFile processor.
So, the question is, how do we build the NiFi dataflow in such a way that FetchFile will only start after the indicator file is received.
Do you have any suggestions on how to achieve this.
Thanks in advance.
Please confirm that you are trying to solve a real problem.
The fact that you need to wait for files to be completely available is not that specific to huge files, also I have not run into this with Nifi specifically. Most filesystems should handle this in a good way automatically.
A normal thing to see is that a file is either hidden, or in a different location until it has become complete.
If this answers the questions, consider marking this as the answer.
The question posted is not a hypothetical one, it is a real use case.
fyi, here is another thread related to partial file consumption; - https://stackoverflow.com/questions/45379729/nifi-how-to-avoid-copying-file-that-are-partially-writt...
that thread does not suggest the OS automatically takes care of this.
The solution proposed there is to add a time wait between ListFile and FetchFile, but in our case, the requirement is to wait for an indicator file before we start file ingestion;