Created on
07-25-2019
04:58 AM
- last edited on
10-08-2019
06:41 AM
by
ask_bill_brooks
Hi,
We have a problem with Cluster Nifi not clearing content_repository. Before we run on Nifi cluster (1.8 version without Ambari) and we did not have that issue, content repository was clearing. Once we move all our flows to Nifi cluster (1.9 with Ambari) we are having issues with content repository filling up and eating all the space. Configurations are exactly the same, we are not receiving too many files open error etc. Content claims seems like is not a problem also. For now we need to restart Nifi cluster once a week to clean up content repository....
Any idea why this is happening? Suggestions welcome:)
Regards,
B
Created on 10-08-2019 01:14 AM - edited 10-08-2019 01:15 AM
Hello,
I've exactly the same problem, I need to restart Nifi on a regular basis to have content_repository cleaned. When I go in data provenance, I can see that all the ContentFiles are in DROP state. My flow is really basic: syslog -> updateattribute -> HDFS
Please note that at the syslog level I work with batch of 1000 files.
What is the detail of your flow?
PS: Yes, I've read this: https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Arc... and also this https://community.cloudera.com/t5/Community-Articles/How-to-determine-which-FlowFiles-are-associated...
Created 11-07-2019 01:10 AM
Hello,
By chance, have you found anything around this problem? Nothing on my side unfortunatly 😞
Created 11-07-2019 06:17 AM
It is possible this is related to https://issues.apache.org/jira/browse/NIFI-6846. This has been merged into Apache NiFi master but not put in a release. If you're a Cloudera supported user please reach out for support on this.
Created 11-21-2019 11:48 PM
Hello @JoeWitt ,
Thanks for your feedback. Actually, my flowFile is created by a syslog processor. I see no error in the Nifi log file regarding processing, and by the way I think I collect all my data correctly.
Stéphane