Is there a way to keep track of slow disks and sometimes slow datanodes on a hadoop cluster. Please suggest.
As Eyad said , there is no solution for monitoring on . You can refer to the link below which gives a comprehensive list of commands to help you isolate the disk performance
You'll see this message "Slow BlockReceiver write packet to mirror took 3336ms (threshold=300ms)" in the datanode logs which indicates there is a slow disk or network related issues. Refer this link on how to use the dd command to check the performance of the disk - https://www.cyberciti.biz/faq/howto-linux-unix-test-disk-performance-with-dd-command/
Logging of slow actions in the Datanodes was added in https://issues.apache.org/jira/browse/HDFS-6110. For disk io you should particularly look for "Slow flushOrSync" in the logs. Additional metrics are added recently in HDFS-10959.