Created 01-17-2019 11:15 PM
hello everyone,
I have a query regarding EvaulateJSONPath processor configuration.
whats the significance of Execution in Scheduling of EvaulateJSONPath.
I have observed when I select Primary node instead of All nodes, Queue builds up as shown in image and stays the same, dosen't decrease , but if I select All nodes Queue gradually becomes 0.
Please suggest.
Created 01-18-2019 12:32 AM
Your GetMongo processor is running on All nodes that means same data is pulled on all nodes..
If you select EvaluateJsonPath processor to run on Only primary node then all the other nodes flowfiles will be left queue before EvaluateJsonPath processor, Because you are not processing flowfiles that are pulled from all other nodes except of PrimaryNode.
Run GetMongoProcessor only on primary node and keep EvaluateJsonPath Processor to run on all nodes, Reason to keep EvaluateJsonPath processor on all nodes if NiFi primary node changed then EvaluateJsonPath processor not going to processor the flowfile that are listed on old PrimaryNode.
Created 01-18-2019 12:32 AM
Your GetMongo processor is running on All nodes that means same data is pulled on all nodes..
If you select EvaluateJsonPath processor to run on Only primary node then all the other nodes flowfiles will be left queue before EvaluateJsonPath processor, Because you are not processing flowfiles that are pulled from all other nodes except of PrimaryNode.
Run GetMongoProcessor only on primary node and keep EvaluateJsonPath Processor to run on all nodes, Reason to keep EvaluateJsonPath processor on all nodes if NiFi primary node changed then EvaluateJsonPath processor not going to processor the flowfile that are listed on old PrimaryNode.
Created 01-18-2019 01:41 AM
Thats a really good answer, I tried doing that and it works, thanks @Shu , can you please give me more insight on "same data is pulled on all nodes "
Created 01-18-2019 02:43 AM
Sure, In NiFi processors that triggers the flow(scheduled to run in cron) we need to run the processors on primary node only and running on all nodes means we are triggering n times the same processor on each node.
That means NiFi each node works with data specifically that receives, in case of Getmongo processor(triggers the flow in this case) when running on all nodes will pull same data.
-
Run GetMongo(source processor) to run on primary node then distribute the load using RemoteProcessorGroups (or) connectionloadbalancing across the cluster.
Created 01-18-2019 03:46 AM
makes sense now, why I was getting duplicate data from mongo... when I was running GetMongo on all nodes. thanks again @Shu