Expert Contributor
Posts: 131
Registered: ‎08-08-2013
Accepted Solution

CDH3: disk failure, datanode doesn't start even after disk replacement

[ Edited ]

in our CDH3 cluster (hadoop-0.20.2, yes, it's pretty old ;) ) we had a disk failure on one node and thereby the datanode went down.
After replacing the disk and setting up directories/permissions, starting the datanode still fails with this error:

2014-04-15 16:14:43,165 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 5, volumes configured: 6, volumes failed: 1, volume failures tolerated: 0
    at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(

How to tell the datanode that the disk has been replaced, or how to "enable" the replaced disk ?!?!
I don't want to configure a tolerated disk failure of 1 to be able to start the datanode ;)

br, Gerd

Expert Contributor
Posts: 131
Registered: ‎08-08-2013

Re: CDH3: disk failure, datanode doesn't start even after disk replacement



issue has been solved. Problem was that there was a mismatch between directory permissions and ownership (owner was 700, not the permissions, stupid thing ;) ).

Nevertheless the error message is somehow misleading and it would preferrably print that the user/permissions are incorrect.