We have a problem with Cluster Nifi not clearing content_repository. Before we run on Nifi cluster (1.8 version without Ambari) and we did not have that issue, content repository was clearing. Once we move all our flows to Nifi cluster (1.9 with Ambari) we are having issues with content repository filling up and eating all the space. Configurations are exactly the same, we are not receiving too many files open error etc. Content claims seems like is not a problem also. For now we need to restart Nifi cluster once a week to clean up content repository....
Any idea why this is happening? Suggestions welcome:)
I've exactly the same problem, I need to restart Nifi on a regular basis to have content_repository cleaned. When I go in data provenance, I can see that all the ContentFiles are in DROP state. My flow is really basic: syslog -> updateattribute -> HDFS
Please note that at the syslog level I work with batch of 1000 files.
What is the detail of your flow?
PS: Yes, I've read this: https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Arc... and also this https://community.cloudera.com/t5/Community-Articles/How-to-determine-which-FlowFiles-are-associated...
It is possible this is related to https://issues.apache.org/jira/browse/NIFI-6846. This has been merged into Apache NiFi master but not put in a release. If you're a Cloudera supported user please reach out for support on this.
Hello @JoeWitt ,
Thanks for your feedback. Actually, my flowFile is created by a syslog processor. I see no error in the Nifi log file regarding processing, and by the way I think I collect all my data correctly.