Hi,
in our CDH3 cluster (hadoop-0.20.2, yes, it's pretty old 😉 ) we had a disk failure on one node and thereby the datanode went down.
After replacing the disk and setting up directories/permissions, starting the datanode still fails with this error:
2014-04-15 16:14:43,165 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 5, volumes configured: 6, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:1025)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:416)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:303)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1643)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1583)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1601)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1727)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1744)
How to tell the datanode that the disk has been replaced, or how to "enable" the replaced disk ?!?!
I don't want to configure a tolerated disk failure of 1 to be able to start the datanode
😉br, Gerd