Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Connection failed to DataNode:50075 sometimes

avatar
Super Collaborator

Hello

We have a testing HDP 2.6 cluster and we receive an Ambari alert regarding connection to a specific DN web ui (50075) several times during the day (between 20-50 times) - even when the node is in idle state or in minimal workload.
The alert is:

DataNode Web UI

Connection failed to http://<DN_NAME>:50075 ([Errno 104] Connection reset by peer)


I restarted the DataNode service and even the entire host but still problem remains
The clue regarding this issue appears in the datanode's log in a perfect correlation to the alert:
"ERROR DefaultPromise.rejectedExecution (Slf4JLogger.java:error(181)) - Failed to submit a listener notification task. Event loop shut down?"
(In /var/log/hadoop/hdfs/hadoop-hdfs-datanode-<dn_name>-drp.log)

I googled of course but haven't found any relevant info about this error....
The /var/log/messages of the server is error-free.
Any ideas what can cause this intermittent behavior ?
Thanks in advance

Adi

1 ACCEPTED SOLUTION

avatar
Super Collaborator

If anyone stumbles upon this error - the solution is increasing the maximum heap size of the datanode
This error can occur if there are pauses in the JVM's garbage collection.

View solution in original post

3 REPLIES 3

avatar

This means that a TCP RST was received and the connection is now closed. This occurs when a packet is sent from your end of the connection but the other end does not recognize the connection; it will send back a packet with the RST bit set in order to forcibly close the connection.

This can happen if the other side crashes and then comes back up or if it calls close() on the socket while there is data from you in transit, and is an indication to you that some of the data that you previously sent may not have been received.

It is up to you whether that is an error; if the information you were sending was only for the benefit of the remote client then it may not matter that any final data may have been lost. However you should close the socket and free up any other resources associated with the connection.

avatar
Super Collaborator

Hi @Abhinav Phutela
Thank you for taking the time to respond.
The issue in hand is not produced by manual operations so i have no control over opeining or closing connections.
It seems to be an issue with this specific datanode under normal workload. I do not receive this Ambari alert in other clusters or on other hosts. Just this host.
It is definitely an error because it is logged as such In /var/log/hadoop/hdfs/hadoop-hdfs-datanode-<dn_name>-drp.log of that specific datanode:

"ERROR DefaultPromise.rejectedExecution (Slf4JLogger.java:error(181)) - Failed to submit a listener notification task. Event loop shut down?"

The node resides in the same rack using the same switch as other nodes which don't have this issue...

Could this be an issue with a faulty NIC maybe ?

Adi

avatar
Super Collaborator

If anyone stumbles upon this error - the solution is increasing the maximum heap size of the datanode
This error can occur if there are pauses in the JVM's garbage collection.