Support Questions

Find answers, ask questions, and share your expertise

What happens to flowfiles in NiFi

avatar
Rising Star

Hello 

I want to know one thing. Let's say we have a flow having a Handlehttprequest then a ExecuteSql processor and then a HandleHttpResponse for example. Daily we receive say 1000 calls to this flow. So where does flowfiles get stored? I see in provenance many many flowfiles. So do they purge or all get accumulated. I have less space so wanted to know.

1 REPLY 1

avatar
Master Mentor

@AlokKumar 

NiFi FlowFiles consist of two parts:

  1. FlowFile Metadata/Attributes - stored in the flowfile_repository, it holds metadata about the FlowFile and attributes added to the FlowFile by processors. 
  2. FlowFile Content - Stored within content claims within the content_repository. A single content claim may hold the content for one too many FlowFiles. Part of a FlowFile's metadata includes the location of the content claim, the starting byte of the content and total number of bytes.   There is also a claimant count associated with each content claim. It is incremented for every active FlowFile (a FlowFile still present with a queue on the NiFi canvas) that references content stored in that claim. One a FlowFile reaches a point of auto-termination within a dataflows, the claimant count on the content claim it references is decremented.  Once the claimant count reaches zero, the claim is eligible for archive and removal/deletion. Content claims are immutable (can not be modified once created). Any NiFI processor that modifies or creates new content writes that content to a new content claim.

Archived content claims are moved to "archive" subdirectories within the content_repository. Archiving can be disable which means that content claims where claimant count is zero are immediately deleted. A background archive thread monitors archived content claims and deletes them based on archive retention settings in the nifi.properties file.  A common misunderstanding is how the "nifi.content.repository.archive.max.usage.percentage".  Lets say it is set to 80%.  Once this disk where the content_repository resides reaches 80% capacity, archive will start purging archived content claims to attempt to bring disk usage below that 80%.  If all archived content claims have been deleted, NiFi will continues to allow new content claims to be created potentially leading to disk being 100% full. For this reason it is VERY important that the content_repository is allocated to its own physical or logical disk.

File System Content Repository Properties

Understanding-how-NiFi-Content-Repository-Archiving-works 

With NiFi provenance you are seeing Provenance event data which includes metadata about the FlowFile, If the content claim referenced by the FlowFile in the provenance event no longer exists on the content_repository (either inside archive subdirectory or outside archive), you'll have no option to replay or view the content.  Provenance is written to its own provenance_repository directory and its retention is also configurable in the nifi.properties file. 

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt