Created 07-06-2022 12:09 PM
Hi,
I'm getting this message when running process with more than 10000 flowfiles.
unable to write flowfile content to content repository container default due to archive file size constraints
I googled and read more about content repository and content claims, but still don't get it.
My archive folder is like 100MB.
Can you please help me understand how to configure NiFi to get rid of this message?
Thanks a lot.
My settings which could affect this is:
nifi.flow.configuration.archive.max.time=30 days
nifi.flow.configuration.archive.max.storage=500 MB
nifi.flow.configuration.archive.max.count=
nifi.queue.backpressure.count=10000
nifi.queue.backpressure.size=1 GB
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
Created 07-07-2022 08:35 AM
@Luwi
The log output you shared implies not much in archive (2 to 6 archived claims each time). So it appears that majority of your disk usage is being consumed by either active claims, other services or files on your system, etc...
- Are you processing large files or a mix of large and small files?
- Are you leaving FlowFiles in connection queues sitting for long periods of time?
- Is disk used for your content_repository used for other things besides NiFi?
Bottom line is that even adjusting the "nifi.content.repository.archive.backpressure.percentage" to a higher percentage just pushes the issue further down the road. You'll hit it again if disk continues to fill with non archived content from NiFi or something external to NiFi. NiFi best practices strongly encourage a dedicated disk for the content_repository(s), and flowfile_repository. Provenance_repository and database_repository may share a disk since you provenance_repository usage can be controlled and database_repository remains relatively small.
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt
Created 07-06-2022 04:44 PM
Created 07-06-2022 09:19 PM
Created 07-07-2022 08:23 AM
@Luwi
Yes, you made add it manually to the nifi.properties file in 1.16.3. NiFi will not read that new property until you restart the service.
Did you upgrade from a previous NiFi release? Not sure why it is missing.
Matt
Created 07-08-2022 12:06 AM
@MattWho ,
thanks. I used previous release, but I did not upgraded, I downloaded it and run on a new server and moved only my processes from old to then new NiFi.
Ludvik
Created 07-07-2022 08:35 AM
@Luwi
The log output you shared implies not much in archive (2 to 6 archived claims each time). So it appears that majority of your disk usage is being consumed by either active claims, other services or files on your system, etc...
- Are you processing large files or a mix of large and small files?
- Are you leaving FlowFiles in connection queues sitting for long periods of time?
- Is disk used for your content_repository used for other things besides NiFi?
Bottom line is that even adjusting the "nifi.content.repository.archive.backpressure.percentage" to a higher percentage just pushes the issue further down the road. You'll hit it again if disk continues to fill with non archived content from NiFi or something external to NiFi. NiFi best practices strongly encourage a dedicated disk for the content_repository(s), and flowfile_repository. Provenance_repository and database_repository may share a disk since you provenance_repository usage can be controlled and database_repository remains relatively small.
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt
Created 07-08-2022 12:16 AM
@MattWho ,
I'm processing invoice data from accounting system where I ask for invoices of two years (one by one) and then split them to invoice level and process each invoice separately. So it's a lot of small flowfiles and the whole process takes around 25 minutes. I think this is what you are referring to as active claims, right?
Content_repository resides on the server disk which is used for other things. That is my current client's setup. if I disable archiving, then this issue will be solved, but I will not have a possibility to check the content in provenance history, correct? Thanks, Ludvik
Created 07-08-2022 11:47 AM
@Luwi
An "active content claim" would be any content claim where a FlowFile exist still referencing bytes of content in that claim. A NiFi content claim file can contain the content for 1 too many FlowFiles. So all it takes is one small FlowFile still queued in some connection anywhere on your NiFi canvas to prevent a content claim from being eligible to be moved to archive. This is why the total reported content queued on yoru canvas will never match the disk usage in your content_repository.
This article is useful in understanding this process more:
https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Arc...
Thank you,
Matt