We are using Nifi TailFile to tail multiple files in a directory and state location is set to "remote". It looks like Nifi is keeping the file state even after the file is deleted. is there a way to remove deleted file from the state, manually or automatically?
State can be deleted from view state option on TailFile processor , the when processor is stopped but it will clear the entire state for tailfile processor, in subsequent run , tailfile file may end up reading the same file again in case file is present. I do not think any harm in keeping the state for files even if files are deleted from the source, Do you have any specific requirements here? or any issues?
I am guessing as state location is set to remote, this seems to have gone into zk's transaction log and filling up disk space fast. we are seeing transaction log growing up to 30GB. Hence, we are looking if there is a way to clean-up state to only existing files.
"Remote" state should only be configured in the tailFile processor when the directory containing the file being tailed is mounted on every node in the NiFi cluster (meaning the flow running on each NiFi cluster node has access to the exact same file being tailed). If it is a shared directory/file, then the tailFile must also be configured to execute on "Primary node" only.
Correct, it is currently mounted on every node. I would have thought Nifi would dropped the file state once log files are deleted or archived but does not look like it. I could see 15K file states on the tailing processor.