Created 04-30-2021 12:02 PM
When running ./nifi.sh status command occasionally the response is that nifi is running but not responding to ping requests. Yet when I open cmd and ping the node I get a normal response and the response time is not out of line with the other nodes. Why is that?
Created 05-03-2021 08:41 AM
The "nifi is running but not responding to ping requests" is not referring to ability to ping the host.
When NiFi is started, you are starting the NiFi bootstrap process. This bootstrap process then starts a child NiFi process which is what runs the main NiFi that you interact with. When you run the status command, it us asking the bootstrap process to return the status of that child process. It is telling you that it sees that the PID for that child process still exists so reports it is running, but when it tries to get a response from that child process it fails and logs that the status ping got no response.
This almost always is the result of some resource contention within NiFi. The NiFi process may have been going through Java Garbage Collection (GC) at the time you ran the status command. (all GC is stop-the-world event, so JVM will not respond to anything else), or thread usage (CPU consumption) was high at the time, etc...
If you see this allot or are seeing node disconnection or UI stability issues, you may want to start monitoring resources on the host, or maybe enable GC logging in your NiFi to see if the service is spending a lot of time doing GC. Increasing heap allocated to NiFi or making dataflow design changes to reduce the heap memory footprint of your current dataflow(s) design would help there.
Hope you found this information helpful.
If so, please take a moment to login and click accept on this solution,
Matt
Created 05-03-2021 08:41 AM
The "nifi is running but not responding to ping requests" is not referring to ability to ping the host.
When NiFi is started, you are starting the NiFi bootstrap process. This bootstrap process then starts a child NiFi process which is what runs the main NiFi that you interact with. When you run the status command, it us asking the bootstrap process to return the status of that child process. It is telling you that it sees that the PID for that child process still exists so reports it is running, but when it tries to get a response from that child process it fails and logs that the status ping got no response.
This almost always is the result of some resource contention within NiFi. The NiFi process may have been going through Java Garbage Collection (GC) at the time you ran the status command. (all GC is stop-the-world event, so JVM will not respond to anything else), or thread usage (CPU consumption) was high at the time, etc...
If you see this allot or are seeing node disconnection or UI stability issues, you may want to start monitoring resources on the host, or maybe enable GC logging in your NiFi to see if the service is spending a lot of time doing GC. Increasing heap allocated to NiFi or making dataflow design changes to reduce the heap memory footprint of your current dataflow(s) design would help there.
Hope you found this information helpful.
If so, please take a moment to login and click accept on this solution,
Matt
Created 05-07-2021 03:04 PM
Thank you. That is the case, there is a lot going on and we are trying to determine the sizing needed for new processes. We did a POC with the need to load about 6 million records. When it came time to implement in Test the cluster isn't enough to handle the load since the data had grown to 173 million. Can't wait to see what is in Prod. 🙂 Thank you for the explanation about the bootstrap and child as well as the suggestions! Much appreciated.