Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What can cause a spike in "Last Contact" of DataNodes ?

What can cause a spike in "Last Contact" of DataNodes ?

Expert Contributor

Hello

Every once in a while i receive a "Stale" alert from DataNode Health Summary alert.
It appears that some DataNodes, every now and then suffer from a spike (over 30 seconds) in sending heartbeat to the NN as seen in the "Last Contact" column in the DataNode Information (which is in the NN UI) - which results in a "stale" alert.
What can cause these spikes ?

Thanks in advance !

Adi


2 REPLIES 2

Re: What can cause a spike in "Last Contact" of DataNodes ?

Mentor

@Adi Jabkowsky

Please check if all your nodes are in the same network segment.

This intermittent problem is usually due to network issues. Check the MTU

How to check and setup the MTU for my network interface.

MTU (Maximum Transmission Unit) is related to TCP/IP networking in Linux

Check the current MTU setting

 $  ip link list

The default is usually 1500

To make the setting permanent for eth0, edit the configuration file /etc/sysconfig/network-scripts/ifcfg-ethx (Red Hat Linux ) /etc/sysconfig/network-scripts/ifcfg-eth(x) (Red Hat Linux )

Sample

DEVICE=eth0
BOOTPROTO=static
BROADCAST=192.168.1.255
HWADDR=00:0F:EA:91:04:07
IPADDR=192.168.1.111
NETMASK=255.255.255.0
NETWORK=192.168.1.0
MTU=1400
ONBOOT=yes
TYPE=Ethernet 

Save the file and restart network service If you are using Redhat:

# service network restart

Please revert

Highlighted

Re: What can cause a spike in "Last Contact" of DataNodes ?

Expert Contributor

Thank you
I Will check it