I have this problem that the processor whose incoming connection queue is full (back pressure is applied) is not working at all like in the picture below.
Once I delete the queue it is working fine. But this is problematic once it gets into operational phase.
Can someone tell me why this is happening and any ways to fix it?
Also, are there any ways to improve IO of PutHDFS except for assigning more concurrent task to it?
Change PutHDFS to run on All Nodes, Right now the processor is running only on Primary Node.
If you have all the queued flowfiles(1,328) in other nodes except Primary node then PutHDFS processor doesn't process those files Until the primary node changes.
In case you have more number of nodes in NiFi cluster running on all nodes will give you best resutl (but need to distribute the load across the cluster by using Load balancing (or) RemoteProcessorGroup).
Increasing concurrent tasks is one of the optimizations we can perform on PutHDFS processor if you are not distributing the load across the cluster only if one node is doing all the work.
@Shu Thank you and you are right. Now it can process and faster than before.
Just wondering about combination of load balancing and concurrent task. If we assign concurrent tasks to some CPU-intensive processor like ExecuteScript and ExecuteStreamCommand, which run on all nodes, will load balancing the queued up data before the processor give better result than simply running it with concurrent task? Because I thought it takes some time to distribute relatively large data (~5 MB) across the cluster.
And why not PutHDFS with some concurrent tasks and load balance together working?
The above question and the entire response thread below were originally posted in the Community Help track. On Sat Jul 20 16:18 UTC 2019, a member of the HCC moderation staff moved it to the Data Ingestion & Streaming track. The Community Help Track is intended for questions about using the HCC site itself.