Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Connection failed to DataNode:50075 sometimes

Solved Go to solution

Connection failed to DataNode:50075 sometimes

Expert Contributor

Hello

We have a testing HDP 2.6 cluster and we receive an Ambari alert regarding connection to a specific DN web ui (50075) several times during the day (between 20-50 times) - even when the node is in idle state or in minimal workload.
The alert is:

DataNode Web UI

Connection failed to http://<DN_NAME>:50075 ([Errno 104] Connection reset by peer)


I restarted the DataNode service and even the entire host but still problem remains
The clue regarding this issue appears in the datanode's log in a perfect correlation to the alert:
"ERROR DefaultPromise.rejectedExecution (Slf4JLogger.java:error(181)) - Failed to submit a listener notification task. Event loop shut down?"
(In /var/log/hadoop/hdfs/hadoop-hdfs-datanode-<dn_name>-drp.log)

I googled of course but haven't found any relevant info about this error....
The /var/log/messages of the server is error-free.
Any ideas what can cause this intermittent behavior ?
Thanks in advance

Adi

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Connection failed to DataNode:50075 sometimes

Expert Contributor

If anyone stumbles upon this error - the solution is increasing the maximum heap size of the datanode
This error can occur if there are pauses in the JVM's garbage collection.

3 REPLIES 3

Re: Connection failed to DataNode:50075 sometimes

New Contributor

This means that a TCP RST was received and the connection is now closed. This occurs when a packet is sent from your end of the connection but the other end does not recognize the connection; it will send back a packet with the RST bit set in order to forcibly close the connection.

This can happen if the other side crashes and then comes back up or if it calls close() on the socket while there is data from you in transit, and is an indication to you that some of the data that you previously sent may not have been received.

It is up to you whether that is an error; if the information you were sending was only for the benefit of the remote client then it may not matter that any final data may have been lost. However you should close the socket and free up any other resources associated with the connection.

Re: Connection failed to DataNode:50075 sometimes

Expert Contributor

Hi @Abhinav Phutela
Thank you for taking the time to respond.
The issue in hand is not produced by manual operations so i have no control over opeining or closing connections.
It seems to be an issue with this specific datanode under normal workload. I do not receive this Ambari alert in other clusters or on other hosts. Just this host.
It is definitely an error because it is logged as such In /var/log/hadoop/hdfs/hadoop-hdfs-datanode-<dn_name>-drp.log of that specific datanode:

"ERROR DefaultPromise.rejectedExecution (Slf4JLogger.java:error(181)) - Failed to submit a listener notification task. Event loop shut down?"

The node resides in the same rack using the same switch as other nodes which don't have this issue...

Could this be an issue with a faulty NIC maybe ?

Adi

Highlighted

Re: Connection failed to DataNode:50075 sometimes

Expert Contributor

If anyone stumbles upon this error - the solution is increasing the maximum heap size of the datanode
This error can occur if there are pauses in the JVM's garbage collection.

Don't have an account?
Coming from Hortonworks? Activate your account here