Support Questions

pjagielski · ‎12-23-2020

Hi,

We recently expected NiFi cluster outage resulting in flowfiles stuck in various processes across the system. After taking thread dump we realized many processors were waiting for "archive expiration":

at java.base@11.0.8/jdk.internal.misc.Unsafe.park(Native Method)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at java.base@11.0.8/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at java.base@11.0.8/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:20
81)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.FileSystemRepository$ContainerState.waitForArchiveExpiration(FileSystemRepository.java:16
14)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.FileSystemRepository.create(FileSystemRepository.java:605)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.claim.ContentClaimWriteCache.getContentClaim(ContentClaimWriteCache.java:61)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2617)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.processors.standard.PartitionRecord.onTrigger(PartitionRecord.java:231)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at app//org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1174)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)

We had standard settings of archive enabled and max usage of 50%. We had disk usage at about 60% however, it was not by the archive itself but other data, so waiting for cleaning up the archive took forever.

How we should configure the archive an monitor it to prevent such outage in the future?

Regards,

Piotr

MattWho · ‎12-23-2020

@pjagielski

It is always helpful to share the exact NiFi version you are running as there may be known issues we can point you.

Assuming here that you may be running latest Apache NiFi 1.12 release, my first thought may be related to this issue:
https://issues.apache.org/jira/browse/NIFI-7992

While your content repo is not filling up, I would suggest inspecting you logs to see how often Content claims are being moved to archive. A background thread then removes those claims as a result of your archive settings.

Hope this helps,
Matt

Cloudera Community

Support Questions

NiFi queues stuck if free disk space exceeds archive max usage %