Reply
Expert Contributor
Posts: 131
Registered: ‎08-08-2013
Accepted Solution

CDH3: disk failure, datanode doesn't start even after disk replacement

[ Edited ]
Hi,

in our CDH3 cluster (hadoop-0.20.2, yes, it's pretty old ;) ) we had a disk failure on one node and thereby the datanode went down.
After replacing the disk and setting up directories/permissions, starting the datanode still fails with this error:

2014-04-15 16:14:43,165 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 5, volumes configured: 6, volumes failed: 1, volume failures tolerated: 0
    at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:1025)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:416)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:303)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1643)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1583)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1601)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1727)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1744)


How to tell the datanode that the disk has been replaced, or how to "enable" the replaced disk ?!?!
I don't want to configure a tolerated disk failure of 1 to be able to start the datanode ;)

br, Gerd

Highlighted
Expert Contributor
Posts: 131
Registered: ‎08-08-2013

Re: CDH3: disk failure, datanode doesn't start even after disk replacement

Hi,

 

issue has been solved. Problem was that there was a mismatch between directory permissions and ownership (owner was 700, not the permissions, stupid thing ;) ).

Nevertheless the error message is somehow misleading and it would preferrably print that the user/permissions are incorrect.

 

Gerd