Hi,
We recently expected NiFi cluster outage resulting in flowfiles stuck in various processes across the system. After taking thread dump we realized many processors were waiting for "archive expiration":
at java.base@11.0.8/jdk.internal.misc.Unsafe.park(Native Method)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at java.base@11.0.8/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at java.base@11.0.8/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:20
81)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.FileSystemRepository$ContainerState.waitForArchiveExpiration(FileSystemRepository.java:16
14)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.FileSystemRepository.create(FileSystemRepository.java:605)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.claim.ContentClaimWriteCache.getContentClaim(ContentClaimWriteCache.java:61)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2617)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.processors.standard.PartitionRecord.onTrigger(PartitionRecord.java:231)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at app//org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1174)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
2020-12-23 11:37:15,230 INFO [main] org.apache.nifi.bootstrap.RunNiFi at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
We had standard settings of archive enabled and max usage of 50%. We had disk usage at about 60% however, it was not by the archive itself but other data, so waiting for cleaning up the archive took forever.
How we should configure the archive an monitor it to prevent such outage in the future?
Regards,
Piotr