Created on 03-16-2020 01:21 AM - last edited on 03-16-2020 02:20 AM by VidyaSargur
Hi Team,
I could see the NIFI content storage has reached the maximum threshold and current storage capacity is 195 GB and it has been reaching the max capacity frequently. How to fix this space issue? Do I need to change any properties in nifi?
[nifi@w0lxdhdp01 conf]$ du -sh /var/lib/nifi/content_repository/
146G /var/lib/nifi/content_repository/
[nifi@w0lxdhdp01 conf]$ du -sh /var/lib/nifi/database_repository/
15M /var/lib/nifi/database_repository/
[nifi@w0lxdhdp01 conf]$ du -sh /var/lib/nifi/flowfile_repository/
103M /var/lib/nifi/flowfile_repository/
[nifi@w0lxdhdp01 conf]$ du -sh /var/lib/nifi/provenance_repository/
[nifi@w0lxdhdp01 conf]$ cat nifi.properties | grep content
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.always.sync=false
nifi.content.repository.archive.enabled=true
nifi.content.repository.archive.max.retention.period=4 hours
nifi.content.repository.archive.max.usage.percentage=30%
nifi.content.repository.directory.default=/var/lib/nifi/content_repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.viewer.url=../nifi-content-viewer/
nifi.remote.contents.cache.expiration=30 secs
Created 03-16-2020 04:45 PM
Sorry to hear you are having space issues with your content repository. The most common reason for space issues is because there are still active FlowFiles referencing the content claims. Since a content claim cannot be moved to archive sub-directory or deleted until there are no FlowFiles referencing that claim, even a small FlowFile can still queued somewhere within a dataflow can result in a large claim not being able to be removed. I recommend using the NiFi Summary UI (Global menu --> Summary) to locate connections with flowfiles just sitting in them not getting processed. Look at the connections tab and click on "queue" to sort connections based on queued FlowFiles. A connection with queued FlowFiles, but shows 0 for both "In/Size" and "Out/Size" are what I would be looking for. This indicates in the last 5 minutes that queue has not changed in amount of queued FlowFiles. You can use the got to arrow to far right to jump to that connection on the canvas. If that data is not needed (just left over in some non active dataflow), right click on the connection to empty the queue. See if after clearing some queues the content repo usage drops.
Is is also possible that not enough file handles exist for your NiFi service user making clean-up not to work efficiently. I recommend increasing the open files limit and process limits for your NiFi service user. Check to see if your flowfile_repository is large or if you have content claims moved to archive sub-directories that have not yet been purged. Does restart of NiFi which would release file handles trigger some cleanup of the repo(s) on startup?
It is also dangerous to have all your NiFi repos co-located on the same disk for risk of corruption to your flowfile repository which can lead to data loss. The flowfile_repository should always be on its own disk, the content_repository should be on its own disk, and the provenance_repository should be on its own disk. The database repository can exist on a disk used for other NiFi files (config files, local state, etc.)
https://community.cloudera.com/t5/Community-Articles/HDF-NIFI-Best-practices-for-setting-up-a-high-p...
Here are some additional articles that may help you:
https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Arc...
Hope this helps,
Matt