Context
I have configured a multinode nifi cluster with three nodes; I have a processor that runs an operation, let's say, on each flow run it creates hive partitions with a time stamp of when the job starts which is basically an update attribute processor after get file processor which sets an attribute timestamp which is later used to create partitions in Hive
getFile ->MergeContent(into Single file) -> SetTimeStamp -> Create HDFS Directories from timestamp -> PutHDFS-CreateHivePartition
Problem:
Each node sets the current time stamp and creates a partition, as there is milliseconds difference between nodes when the flowfile reaches to update attribute. so the number of partitions created for single scheduled ingestion is equal to the number of nodes.
Here i want to capture a single time stamp of the flow file which reaches SetTimeStamp first this will be same for flowfiles across cluster for that scheduled job.This way i will get single partition .
For this I configured SetTimeStamp processor to run on primary node, this worked fine for pimary node flowfiles.But for other nodes flowfiles get queued for SetTimeStamp and hence there is partial injection.
Why do flow files get queued ?
How do i bypass the setAttributeProcessor for flowfiles on non primary nodes?
@Matt Clarke Any help from would be appricaited