How can we identify there is network issue across hadoop nodes. Any good tool and what to look? Also is there a way to alert when network speed become slow?
By network issues, if you are looking at any packet loss, ifconfig should be able to give dropped packets and such.
iperf is another tool to identify issues between 2 nodes.
You can take a look at https://github.com/apache/tez/tree/master/tez-tools/analyzers which has some analyzer tools that can work on output of a tez job to give IO and network details.
For continuous monitoring, see if bandwidthd helps.
@nyadav You can run benchmark Test like TestDFSIO which read data from different nodes of cluster and can let you find if you have any network issue like node down or node is not able to talk over network.
TeraSort shuffle data a lot across network so if you have any issue you will slowness or speculative execution.
This is a good reference for different type of benchmark test on Hadoop : http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-t...
@nyadav TestDFSIO will benchmark the HDFS by reading and writing data
To trace the Nodemanager performance you can run Terasort . Terasort will generate MR job to shuffle data across nodes and sort the data.