Created on 10-04-201808:11 PM - edited 08-17-201906:18 AM
data movement use cases do not require a “shuffle phase” for redistributing
FlowFiles across a NiFi cluster, but there are few cases where it is useful.
each case, the flow starts with a processor that generates tasks to run (e.g.
filenames) followed by the actual execution of those tasks. To scale, tasks
need to run on each node in the NiFi cluster, but for consistency, the task
generation should only run on the primary node. The solution is to introduce a
shuffle (aka load balancing) step in between task generation and task
can be configured to run on the primary node by going to “View
Configuration”-> “Scheduling” and selecting “Primary node only” under
shuffle step is not an explicit component on the NiFi canvas, but rather the
combination of a Remote Input Port and a Remote Process Group pointing at the
local cluster. FlowFiles that are sent to the Remote Process Group will be load
balanced over Site-to-Site and come back into the flow via the Remote Input
Port. Under “Manage Remote Ports” on the Remote Process Group there are batch
settings that help control the load balancing.
are two example flows that use this design pattern: