We have 2 clusters (6 instances each one) running with NIFI 1.1.2 + JDK 8u121 + Linux CentOS
The traffic get divided between those 2 clusters:
1. TPS: 2700 - EAST cluster
2. TPS: 980. - WEST cluster
We have tried to migrate to NIFI 1.2.0, 1.3.0, and 1.4.0, but the cluster with higher TPS (EAST) got stuck after 4 hours of intensive traffic. Also its web console got unresponsive.
I've tried many things to fix this thing, but only thing I got was to increase the time from 4 to 6 hours before it fails
Our current instances are running on AWS and each EC2 instances has 8 cpus (c5.2xlarge), and 16GB RAM.
I've tried to use c5.4xlarge (it doubles the cpu and ram), but I got the same outcome.
I don't have a clue to figure it out what the issue is. Also I have a Datadog dashboard to track some java head metrics but everything looks normal.
What should I do to find why those new better instances are failing? is it memory or disk space or threads got stuck? Why an old NIFI cluster conf works better than a new NIFI?
Hope you can help me with this.
You would need to do a few things to monitor the performance. These include:
If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.