Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

What can cause a spike in "Last Contact" of DataNodes ?

Expert Contributor

Hello

Every once in a while i receive a "Stale" alert from DataNode Health Summary alert.
It appears that some DataNodes, every now and then suffer from a spike (over 30 seconds) in sending heartbeat to the NN as seen in the "Last Contact" column in the DataNode Information (which is in the NN UI) - which results in a "stale" alert.
What can cause these spikes ?

Thanks in advance !

Adi


2 REPLIES 2

Mentor

@Adi Jabkowsky

Please check if all your nodes are in the same network segment.

This intermittent problem is usually due to network issues. Check the MTU

How to check and setup the MTU for my network interface.

MTU (Maximum Transmission Unit) is related to TCP/IP networking in Linux

Check the current MTU setting

 $  ip link list

The default is usually 1500

To make the setting permanent for eth0, edit the configuration file /etc/sysconfig/network-scripts/ifcfg-ethx (Red Hat Linux ) /etc/sysconfig/network-scripts/ifcfg-eth(x) (Red Hat Linux )

Sample

DEVICE=eth0
BOOTPROTO=static
BROADCAST=192.168.1.255
HWADDR=00:0F:EA:91:04:07
IPADDR=192.168.1.111
NETMASK=255.255.255.0
NETWORK=192.168.1.0
MTU=1400
ONBOOT=yes
TYPE=Ethernet 

Save the file and restart network service If you are using Redhat:

# service network restart

Please revert

Expert Contributor

Thank you
I Will check it

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.