Support Questions

Bfields · ‎11-30-2020

Nifi version 1.9.0.3.4.1.1-4

of my 3 node cluster we have one server that tends to get an excessive amount of timeout errors. If this node is ever master/coordinator data processing is very slow. If the nifi service is restarted while this node is master/coordinator this server starts back up with unable to create native thread. This only happens with 1 node on my cluster and all other nodes work as intended.

This setup has worked for months with no issues. Only change made was reverted back. That change was to round robin load balance. That node is showing low utilization. We are just lost and any help is greatly appericated.

last note we have senstive data flowing through this cluster so getting full logs are not easy for us. I tried to attached what I could

top - 15:08:53 up 3 days, 0 min, 1 user, load average: 0.39, 0.50, 0.63
Tasks: 410 total, 1 running, 409 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.2 us, 0.8 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 39604499+total, 54695292 free, 19844422+used, 14290547+buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 19644432+avail Mem

TimothySpann · ‎11-30-2020

check your timeouts

turn off or fix any firewalls

test any network calls from other machines. could also be the sFTP server you are reading from

Connection timed out (Connection timed out); routing to comms.failure: java.io.IOException: Failed to obtain connection to remote host due to com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out (Connection timed out)
java.io.IOException: Failed to obtain connection to remote host due to com.jcraft.jsch.JSchException: java.net.Connect

Up the timeouts for the network calls. How many NIC cards do you have are they 10Gb+?

What is the RAM? I recommend 32GB RAM with most to JVM, 30-32 cores.

The best practice is to use Cloudera Flow Management with a Cloudera Manager's managed cluster it will make sure everything is running properly.

You can also restart them to get a different leading node. Usually when you do sFTP you have only one node making the calls, so that's why that one will get timeout errors calling that SFTP server. Make the timeout greater, your SFTP may be slow or offline or blocked by firewall/gateway/proxy/linux network

Cloudera Community

Support Questions

1 node in the cluster getting excessive timeout errors

Error Securing NiFi Cluster with a Single Certific...

Apache Metron TP 1 Install Instructions- Single N...

Getting aconnection timeout error while running s...

In Apache NIFI requests are queued and showing so...

Error add node into cluster

Disaster recovery and Backup best practices in a t...

Static edge node setup before cluster deployed wit...

Atlas Solr Zookeeper timeout configs

Adjusting startup timeout for ambari-server web UI

Offload NiFi Cluster Nodes using the NiFi Toolkit ...