Member since
02-19-2016
9
Posts
8
Kudos Received
0
Solutions
06-02-2016
11:43 PM
2 Kudos
This looks like a network issue of your datanodes to handle the replication workload. Can you check the ifconfig output for MTU of all the datanodes and ensure it is consistently configured? Below is a short list from a tutorial by @mjohnson on network best practice, which could help you troubleshooting. https://community.hortonworks.com/articles/8563/typical-hdp-cluster-network-configuration-best-pra.html "Make certain all members to the HDP cluster have passwordless SSH configured. Basic heartbeat (typical 3x/second) and administrative commands generated by the Hadoop cluster are infrequent and transfer only small amounts of data except in the extremely large cluster deployments. Keep in mind that NAS disks will require more network utilization than plain old disk drives resident on the data node. Make certain both host Fully Qualified Host Names as well as Host aliases are defined and referenceable by all nodes within the cluster. Ensure the network interface is consistently defined for all members of the Hadoop cluster (i.e. MTU settings should be consistent) Look into defining MTU for all interfaces on the cluster to support Jumbo Frames (typically MTU=9000). But only do this make certain that all nodes and switches support this functionality. Inconsistent MTU or undefined MTU configurations can produce serious problems with the network. Disable Transparent Huge Page compaction for all nodes on the cluster. Make certain all all of the HDP cluster’s network connections are monitored for collisions and lost packets. Have the Network administration team tune the network as required to address any issues identified as part of the network monitoring."
... View more