Support Questions
Find answers, ask questions, and share your expertise

Unable to write flowfile content to content repository container default due to archive file size constraints

Explorer

Hi,

 

I'm getting this message when running process with more than 10000 flowfiles.

unable to write flowfile content to content repository container default due to archive file size constraints

 

I googled and read more about content repository and content claims, but still don't get it.

My archive folder is like 100MB.

 

Can you please help me understand how to configure NiFi to get rid of this message?

 

Thanks a lot.

 

My settings which could affect this is:

nifi.flow.configuration.archive.max.time=30 days

nifi.flow.configuration.archive.max.storage=500 MB

nifi.flow.configuration.archive.max.count=

 

nifi.queue.backpressure.count=10000

nifi.queue.backpressure.size=1 GB

 

nifi.content.repository.archive.max.retention.period=12 hours

nifi.content.repository.archive.max.usage.percentage=50%

nifi.content.repository.archive.enabled=true

 

nifi_archive_error.png

1 ACCEPTED SOLUTION

Master Guru

@Luwi 

The log output you shared implies not much in archive (2 to 6 archived claims each time). So it appears that majority of your disk usage is being consumed by either active claims, other services or files on your system, etc...
- Are you processing large files or a mix of large and small files?
- Are you leaving FlowFiles in connection queues sitting for long periods of time?
- Is disk used for your content_repository used for other things besides NiFi?

Bottom line is that even adjusting the "nifi.content.repository.archive.backpressure.percentage" to a higher percentage just pushes the issue further down the road.  You'll hit it again if disk continues to fill with non archived content from NiFi or something external to NiFi.  NiFi best practices strongly encourage a dedicated disk for the content_repository(s), and flowfile_repository.  Provenance_repository and database_repository may share a disk since you provenance_repository usage can be controlled and database_repository remains relatively small.

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt

 

View solution in original post

7 REPLIES 7

Master Collaborator

@Luwi ,

 

Please have a look at this recent answer by @MattWho , where he explains the error you're seeing.

 

https://community.cloudera.com/t5/Support-Questions/Problem-with-Merge-Content-Processor-after-switc...

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Explorer

@araujo ,

thanks. @MattWho is mentioning "nifi.content.repository.archive.backpressure.percentage", but I can't see it in my nifi.properties file. (I use NiFi 1.16.3). May I add it manually?

 

Thanks

Ludvik

Master Guru

@Luwi 

Yes, you made add it manually to the nifi.properties file in 1.16.3.  NiFi will not read that new property until you restart the service.
Did you upgrade from a previous NiFi release?  Not sure why it is missing.

Matt

Explorer

@MattWho ,

thanks. I used previous release, but I did not upgraded, I downloaded it and run on a new server and moved only my processes from old to then new NiFi.

 

Ludvik

Master Guru

@Luwi 

The log output you shared implies not much in archive (2 to 6 archived claims each time). So it appears that majority of your disk usage is being consumed by either active claims, other services or files on your system, etc...
- Are you processing large files or a mix of large and small files?
- Are you leaving FlowFiles in connection queues sitting for long periods of time?
- Is disk used for your content_repository used for other things besides NiFi?

Bottom line is that even adjusting the "nifi.content.repository.archive.backpressure.percentage" to a higher percentage just pushes the issue further down the road.  You'll hit it again if disk continues to fill with non archived content from NiFi or something external to NiFi.  NiFi best practices strongly encourage a dedicated disk for the content_repository(s), and flowfile_repository.  Provenance_repository and database_repository may share a disk since you provenance_repository usage can be controlled and database_repository remains relatively small.

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt

 

Explorer

@MattWho ,

 

I'm processing invoice data from accounting system where I ask for invoices of two years (one by one) and then split them to invoice level and process each invoice separately. So it's a lot of small flowfiles and the whole process takes around 25 minutes. I think this is what you are referring to as active claims, right? 

Content_repository resides on the server disk which is used for other things. That is my current client's setup. if I disable archiving, then this issue will be solved, but I will not have a possibility to check the content in provenance history, correct? Thanks, Ludvik

Master Guru

@Luwi 
An "active content claim" would be any content claim where a FlowFile exist still referencing bytes of content in that claim.  A NiFi content claim file can contain the content for 1 too many FlowFiles.  So all it takes is one small FlowFile still queued in some connection anywhere on your NiFi canvas to prevent a content claim from being eligible to be moved to archive.  This is why the total reported content queued on yoru canvas will never match the disk usage in your content_repository.

This article is useful in understanding this process more:
https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Arc...

Thank you,

Matt

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.