Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Content Repository doesn't clean up automatically until restart of Nifi Server !!!!

Content Repository doesn't clean up automatically until restart of Nifi Server !!!!

New Contributor

Hello,
I have a problem regarding content repository which fill up until eating all the space available (200GO). I'm using the following flow to ingest data continuously from an OPC Server and publish to a kafka server.109559-ingestion-archi.png

I have changed the nifi configuration as to make the ingestion fast without much latency as following :

# FlowFile Repository
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.VolatileFlowFileRepository
nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
nifi.flowfile.repository.directory=./flowfile_repository
nifi.flowfile.repository.partitions=256
nifi.flowfile.repository.checkpoint.interval=2 mins
nifi.flowfile.repository.always.sync=false

nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
nifi.queue.swap.threshold=20000
nifi.swap.in.period=5 sec
nifi.swap.in.threads=1
nifi.swap.out.period=5 sec
nifi.swap.out.threads=4

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=10
nifi.content.repository.directory.default=./content_repository
nifi.content.repository.archive.max.retention.period=1 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/


# Provenance Repository Properties
nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository
nifi.provenance.repository.debug.frequency=1_000_000
nifi.provenance.repository.encryption.key.provider.implementation=
nifi.provenance.repository.encryption.key.provider.location=
nifi.provenance.repository.encryption.key.id=
nifi.provenance.repository.encryption.key=


# Persistent Provenance Repository Properties
nifi.provenance.repository.directory.default=./provenance_repository
nifi.provenance.repository.max.storage.time=2 hours
nifi.provenance.repository.max.storage.size=1 GB
nifi.provenance.repository.rollover.time=30 secs
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
# Comma-separated list of fields. Fields that are not indexed will not be searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, AlternateIdentifierURI, Relationship, Details
#nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID, Relationship
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID, Relationship, ContentClaimIdentifier

# FlowFile Attributes that should be indexed and made searchable.  Some examples to consider are filename, uuid, mime.type
nifi.provenance.repository.indexed.attributes=
# Large values for the shard size will result in more Java heap usage when searching the Provenance Repository
# but should provide better performance
nifi.provenance.repository.index.shard.size=500 MB
# Indicates the maximum length that a FlowFile attribute can be when retrieving a Provenance Event from
# the repository. If the length of any attribute exceeds this value, it will be truncated when the event is retrieved.
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2


# Volatile Provenance Respository Properties
nifi.provenance.repository.buffer.size=100000

# Component Status Repository
nifi.components.status.repository.implementation=org.apache.nifi.controller.status.history.VolatileComponentStatusRepository
nifi.components.status.repository.buffer.size=1440
nifi.components.status.snapshot.frequency=1 min


2 REPLIES 2
Highlighted

Re: Content Repository doesn't clean up automatically until restart of Nifi Server !!!!

New Contributor

Hi, I have the same problem, I wonder if the option nifi.content.repository.always.sync = true would solve the problem?
the sensation that gives me is that it is not synchronized with what is in the content repository directories since when reviewing them there are older files than what is stipulated in the archive configuration

 

jdieterich_1-1590585980118.jpeg

 

jdieterich_0-1590585963137.png

 

 

Will there be any method to force synchronization?

 

Highlighted

Re: Content Repository doesn't clean up automatically until restart of Nifi Server !!!!

New Contributor

Hello

 

I have the exact same problem

 

I just installed nifi 1.11.4 on a windows 10 machine, launched nifi with default configuration and created simple flowfiles; what it always happens is that the flowfiles are not archived immediately but upon the next nifi restart.

 

The flowfiles are no longer in use and the flowfile repository is updated within two minutes after saving the flowfile in the content repository, as expected, but for the file to be archived a restart is necessary

 

I observed the same behavior with nifi 1.9.2 on a mac, and with nifi 1.9.0 on a cluster of which I am not adminstrator

 

Is this the default behavior of nifi? I read the documentation and nowhere can I find anything saying nifi must be restarted in order to archive files in the repository

 

Thanks a lot

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here