Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why only flowfile repository disk is getting full and other repos are not?

avatar
Master Collaborator

Hello experts,

I am facing an issue in one of the Nifi server where we have multiple consume eventhub flows.

The flow file repository disc is getting full but content and provenance repos are not. 

Have attached the screen shot of all repos usage and content of flowfile repo.

journals folder is occupying very large amount of data.

hegdemahendra_0-1715418211654.png

hegdemahendra_1-1715418702964.png

nifi.properties (related to flofile repo):
nifi.flowfile.repository.always.sync=false
nifi.flowfile.repository.checkpoint.interval=2 mins
nifi.flowfile.repository.directory=/flowfile
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.flowfile.repository.partitions=256
nifi.flowfile.repository.retain.orphaned.flowfiles=true
nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog

 

 

Can anyone help me understand what is the issue? how to resolve this?

Thanks,

Mahendra

 

2 REPLIES 2

avatar
Master Collaborator

@MattWho - would appreciate if you have any comment on this issue. Thanks in advance.

avatar
Super Mentor

Hello @hegdemahendra 

Always very helpful if you include the exact version of Apache NiFI, Cloudera HDF, or Cloudera CFM being used.

My guess here would be one or both of the following:

  1. You have multiple FlowFiles all pointing at the same content claims queued in connections within your dataflow(s) on the canvas.  As long as a FlowFile exists on the canvas it will exist in flowfile_repository.   Users should avoid leaving FlowFiles queued in connection on NiFi. Some users tend to allow FlowFile to accumulate at stopped processor components rather then auto-terminate them.  Even if a FlowFile does not have any content its FlowFile attributes/metadata still consume disk space.
  2. You are extracting content from your FlowFiles into FlowFile attributes resulting in large FlowFile attribute/metadata being stored in the flowfile_repository.   Dataflow designers should avoid extracting large amounts flowfile content in to the FlowFile's attributes.  Instead try to build dataflows and utilize components that read content from the FlowFile's content instead of from FlowFile attributes.

Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt