Support Questions

Find answers, ask questions, and share your expertise

Apache NiFi do not release Content Storage space after archive was desabled

avatar
Explorer

Hey, everyone!

Could you help me with next problem, please? I have a test NiFi instance, where I turned on Content Storage archive. It was works good, but one time I decided to increase value of "nifi.content.repository.archive.max.usage.percentage" from default 50% to 70%.

So, NiFi utilized space to 70% of total as I expected.
But, after that I disabled archiving and expected that NiFi releases all data used by archive, but it doesn't happens.
Why so? I've saw messages, that archived data is never cleanup if "nifi.content.repository.archive.enabled" set to "false" after it has been "true". Is that truth?

My current settings:

nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.repository.directory.repo0=/mnt/nifi/repos/content_repository
nifi.content.repository.archive.max.retention.period=6 hours
nifi.content.repository.archive.max.usage.percentage=60%
nifi.content.repository.archive.backpressure.percentage=70%
nifi.content.repository.archive.enabled=false
nifi.content.repository.always.sync=false

 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@asand3r 

Changing following to false turns off archiving.

nifi.content.repository.archive.enabled

NiFi does not clean-up files left in these directories once archive is disabled. Since archive is disabled the archive code that would scan these directories to remove old archive data is not longer executing.  

You'll need to manually purge the archived content claims from the archive sub-directories after disabling content_repository archiving.

So your two nodes that still have archive data had that data still present at shutdown while the others did not have archive data after shutdown.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@asand3r 

Need some more detail to provide a good answer here...

  1. What version of Apache NiFi or Cloudera Flow Management are you using?
  2. After changing "nifi.content.repository.archive.enabled" to false in the nifi.properties file, did you restart NiFi?
  3. If you manually inspect the archive sub-directories, do any of them still hold files or are all of the archive sub-directories within the content_repository empty?  If they are empty then archive clean-up is complete.
  4. You mention " I've saw messages, that archived data is never cleanup", can you share this message you are seeing which I assume is from the nifi-app.log?

Keep in mind that disabling archive will not prevent content_repository from filling the disk where it resides to 100%.  Content claims associated to actively queued FlowFiles within your dataflows on the NiFi canvas will still exist in the content_repository.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
Explorer

@MattWho thanks for your answer.

1. It's Apache NiFi 1.18.0
2. Yeap, NiFi was restarted; and I usually restart it after any changes in nifi.properties was made.
3. Hmm, I have confused here. I'm newbie and before I asked my quetion thought that NiFi do not move files somewhere else. But now I see 'archive' directories in content repo. Now, I has three-node cluster with disabled archive (after it was enabled earlier) -- one node has no any files inside 'archive' directories, but other tho has.
4. Sorry, it's from private chat with my colleagues. 😃

So, basically, if I set "nifi.content.repository.archive.enabled" to "false" and restart NiFi service, it must delete all earlier archived data? I was disable it about 4 hours ago, but two nodes still has files inside "*/archive/*" directories.

[user@nifi-host content_repository]$ pwd
/mnt/nifi/repos/content_repository
[user@nifi-host content_repository]$ find . -path "*/archive/*" | wc -l
3955

avatar
Master Mentor

@asand3r 

Changing following to false turns off archiving.

nifi.content.repository.archive.enabled

NiFi does not clean-up files left in these directories once archive is disabled. Since archive is disabled the archive code that would scan these directories to remove old archive data is not longer executing.  

You'll need to manually purge the archived content claims from the archive sub-directories after disabling content_repository archiving.

So your two nodes that still have archive data had that data still present at shutdown while the others did not have archive data after shutdown.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
Explorer

@MattWho thanks so much.
Is it ok, if I simply remove archived data while NiFi is running? Or I must stop a node before delete? 

find /mnt/nifi/repos/content_repository -path "*/archive/*" -exec rm -f {} \;

 

avatar
Master Mentor

@asand3r 

With Archive disabled, NIFi is no longer tracking the files left in the archive sub-directories.  You can remove those files while NiFi is running.  Just make sure you don't touch the active content_repository claims.

Matt