About G_B

shubham_sharma · ‎06-03-2025

Hi @G_B It could be a issue with your JDK version. Compare the JDK version in your working and non-working datanodes and try to upgrade/downgrade accordingly.

MattWho · ‎06-06-2024

@G_B NiFi cluster deployments expect that all nodes in the cluster have same hardware specifications. There is no option in NiFi's Load Balanced connections to customize load-balancing based on current CPU load average of some other node. Even doing so would require NiFi nodes to continuously ping all other nodes to get the current load average before sending FlowFiles which would impact performance. The only thing that would result in any form of variation in distribution would be a node receive rate being diminished, but that is out of NiFi's control. Round Robin will skip a node in rotation if the node is unable to receive FlowFiles as fast as another node. Also keep in mind that a NiFi Cluster elects a node the roles "cluster coordinator" and "primary node". Sometimes both roles get assigned to same node. The assignment of these roles can change at. anytime. The primary node is only node that will schedule "primary node" only processors to execute. So your one node lighter on CPU could also end up assigned this role adding to its CPU load average. Often CPU load average is not only impacted by volume, but also content size of the FlowFiles. The LB connections also do not take in to account FlowFile content size when distributing FlowFiles. While your best option here performance wise is to make sure all nodes have same hardware specifications, there are a few less performant options you could try to distribute your data differently. 1. Use Remote Process Group (RPG) which uses Site-To-SIte (S2S) to distribute FlowFiles across your NiFi nodes. Always recommend using RPG to push to a Remote Input port rather then pull from an Remote output port to achieve better load distribution. Issue here is you need to add RPGs and Remote ports everywhere you were previously using LB configured connections. 2. Build a smart data distribution reusable dataflow. You could build a data flow that sorts FlowFiles by their content size ranges, merges bundles via mergeContent using FlowFile Stream, v3 merge format, send bundles based on size ranges to your various nodes via invokeHTTP to listenHTTP, and then unpackContent once received to extract the FlowFile bundle. This mergeContent is going to add addition cpu load. 3. Consider using DistributeLoad (can be configured with weighted distribution allowing you to create three distribution relationships with maybe like 5 FlowFile per relationship 1 and 2, and relationship with only 1 per iteration. This allows you to send 1 to you lower core node for every 5 sent to other two nodes. You would still need to use updateAttribute (set custom target node URL), mergeContent, invokeHttp, ListenHTTP, and unpackContent in this flow. So if addressing your hardware differences is not option, Number 1 is probably your next best choice. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎06-15-2025 05:02 PM

Member Since	‎06-05-2024 08:07 AM
Last Visited	‎06-15-2025 05:02 PM
Posts	2

Cloudera Community

Re: Unable to open files - HDFS - Couldn't find da...

Re: Load balancing in NiFi - Heterogenous Nodes in...