- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
1 node in the cluster getting excessive timeout errors
- Labels:
-
Apache NiFi
Created ‎11-30-2020 07:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nifi version 1.9.0.3.4.1.1-4
of my 3 node cluster we have one server that tends to get an excessive amount of timeout errors. If this node is ever master/coordinator data processing is very slow. If the nifi service is restarted while this node is master/coordinator this server starts back up with unable to create native thread. This only happens with 1 node on my cluster and all other nodes work as intended.
This setup has worked for months with no issues. Only change made was reverted back. That change was to round robin load balance. That node is showing low utilization. We are just lost and any help is greatly appericated.
last note we have senstive data flowing through this cluster so getting full logs are not easy for us. I tried to attached what I could
top - 15:08:53 up 3 days, 0 min, 1 user, load average: 0.39, 0.50, 0.63
Tasks: 410 total, 1 running, 409 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.2 us, 0.8 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 39604499+total, 54695292 free, 19844422+used, 14290547+buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 19644432+avail Mem
Created ‎11-30-2020 08:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
check your timeouts
turn off or fix any firewalls
test any network calls from other machines. could also be the sFTP server you are reading from
Connection timed out (Connection timed out); routing to comms.failure: java.io.IOException: Failed to obtain connection to remote host due to com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out (Connection timed out)
java.io.IOException: Failed to obtain connection to remote host due to com.jcraft.jsch.JSchException: java.net.Connect
Up the timeouts for the network calls. How many NIC cards do you have are they 10Gb+?
What is the RAM? I recommend 32GB RAM with most to JVM, 30-32 cores.
The best practice is to use Cloudera Flow Management with a Cloudera Manager's managed cluster it will make sure everything is running properly.
You can also restart them to get a different leading node. Usually when you do sFTP you have only one node making the calls, so that's why that one will get timeout errors calling that SFTP server. Make the timeout greater, your SFTP may be slow or offline or blocked by firewall/gateway/proxy/linux network
