I am having problems when using the processor ScrollElasticSearch in a cluster. I defined the setting "primary node only" but after the re-election of primary node, "older primary node" keep scrolling the ElasticSearch and the "newer primary node" start scrolling the ElasticSearch.
There-s a similar question using the GetFile processor, without state, however ScrollElasticSearch processor use state.
How can I set my cluster, my pipeline or my processor to stop generating duplicate flowfiles?
NiFi is always go to favor data duplication over data loss. In this case you have your ScrollElasticSearchHttp processor executing on Primary node only. When the primary node changes, the primary node only processors are stopped on old primary node and started on new primary node. This can result in duplication because state is only updated at completion of the the currently executing thread. When the processor is stopped on the old primary node, any active thread is not killed. The processor goes in to a "stopping" status and is only completely stopped once all active threads have been completed or terminated. Meanwhile on the newly elected primary node it has been asked to start and thus reads from the same cluster state which has not been updated yet by the still executing thread on the old primary node.
Typically the election of a new primary node is the result of the old primary node becoming disconnected from the cluster or not communicating with ZK resulting in a new election of another node. Either way there is some connective issue (which may be temporary and go unnoticed by the user) and NiFi can only assume that that node may have crashed. The new primary node has no way of knowing if that old primary node is still executing on that processor.
If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.