Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI 1.4.0 gets unresponsive after heavy load

Highlighted

NIFI 1.4.0 gets unresponsive after heavy load

New Contributor

We have 2 clusters (6 instances each one) running with NIFI 1.1.2 + JDK 8u121 + Linux CentOS

The traffic get divided between those 2 clusters:

1. TPS: 2700 - EAST cluster

2. TPS: 980. - WEST cluster

We have tried to migrate to NIFI 1.2.0, 1.3.0, and 1.4.0, but the cluster with higher TPS (EAST) got stuck after 4 hours of intensive traffic. Also its web console got unresponsive.

I've tried many things to fix this thing, but only thing I got was to increase the time from 4 to 6 hours before it fails

Our current instances are running on AWS and each EC2 instances has 8 cpus (c5.2xlarge), and 16GB RAM.

I've tried to use  c5.4xlarge (it doubles the cpu and ram), but I got the same outcome.

I don't have a clue to figure it out what the issue is.  Also I have a Datadog dashboard to track some java head metrics but everything looks normal.

What should I do to find why those new better instances are failing? is it memory or disk space or threads got stuck? Why an old NIFI  cluster conf works better than a new NIFI?

Hope you can help me with this. 

Thanks

1 REPLY 1
Highlighted

Re: NIFI 1.4.0 gets unresponsive after heavy load

Master Collaborator

@manuel_loayza 

 

You would need to do a few things to monitor the performance.  These include:

  1. Watching the nifi logs on all nodes during times of issues.  This also includes going back and opening the files and searching for ERRORS.
  2. Watching the nodes during times of issues.  Looking at "top", looking at ambari metrics for ram/cpu/, garbage collection, etc
  3. Check your NiFi min/max ram.  Compare these values with system resource and tune higher.
  4. Tune nifi UI for min/max threads.
  5. Watch garbage collection and tune.
  6. Consider upgrading to 1.9 + to make most out of increased functionality with node to node and clustering capabilities. Upgrading version will not address 1-5, so sort them first.

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

Don't have an account?
Coming from Hortonworks? Activate your account here