Support Questions

Find answers, ask questions, and share your expertise

NIFI 1.4.0 gets unresponsive after heavy load

New Contributor

We have 2 clusters (6 instances each one) running with NIFI 1.1.2 + JDK 8u121 + Linux CentOS

The traffic get divided between those 2 clusters:

1. TPS: 2700 - EAST cluster

2. TPS: 980. - WEST cluster

We have tried to migrate to NIFI 1.2.0, 1.3.0, and 1.4.0, but the cluster with higher TPS (EAST) got stuck after 4 hours of intensive traffic. Also its web console got unresponsive.

I've tried many things to fix this thing, but only thing I got was to increase the time from 4 to 6 hours before it fails

Our current instances are running on AWS and each EC2 instances has 8 cpus (c5.2xlarge), and 16GB RAM.

I've tried to use  c5.4xlarge (it doubles the cpu and ram), but I got the same outcome.

I don't have a clue to figure it out what the issue is.  Also I have a Datadog dashboard to track some java head metrics but everything looks normal.

What should I do to find why those new better instances are failing? is it memory or disk space or threads got stuck? Why an old NIFI  cluster conf works better than a new NIFI?

Hope you can help me with this. 





You would need to do a few things to monitor the performance.  These include:

  1. Watching the nifi logs on all nodes during times of issues.  This also includes going back and opening the files and searching for ERRORS.
  2. Watching the nodes during times of issues.  Looking at "top", looking at ambari metrics for ram/cpu/, garbage collection, etc
  3. Check your NiFi min/max ram.  Compare these values with system resource and tune higher.
  4. Tune nifi UI for min/max threads.
  5. Watch garbage collection and tune.
  6. Consider upgrading to 1.9 + to make most out of increased functionality with node to node and clustering capabilities. Upgrading version will not address 1-5, so sort them first.