I am working on scenario where I need to pull data from table in Sql Server and Store it to some place.
Currently, I have designed below workflow (at high level):
GenerateTableFetch --> Execute Sql --> ConvertRecord --> PutFile
I am working on a NiFi cluster and observed that, each node on cluster is executing this workflow individually and keeping output files on its local storage.
I am looking for a way where output of GenerateTableFetch i.e. flow files with queries will get distribute equally to let say 4 nodes. Each node will have unique sets of queries to execute by Execute Sql task.
Here, for example, if I have 12 GB Data and 4 nodes, and GenerateTableFetch is generating query to pull 1 GB data in one flow file, then each node should share the work and individually pull 3 GB data.
Can someone please help me to achieve this?
Also, if each node will deal to pull specific set of data then, What will happen when that node will go down? Is there any way such that if some node is failed then its work items will be shared with other nodes in cluster?