Created on 02-22-2016 06:02 AM - edited 08-18-2019 05:51 AM
Some disks failed in my HDFS cluster. and nodemanager cannot start in these nodes.
How to fix it ?
Created 02-22-2016 06:50 AM
In Namenode UI check and ensure that there are no missing and corrupt blocks. If this is true then you can successfully remove failed disk from DataNode.
Refer this for details
Created 02-22-2016 06:26 AM
Is multiple disks on the same node?
if yes, I think you can decommission the node from Ambari->hosts->data node -> you will find decommission from drop down.
Created 02-22-2016 07:11 AM
Thanks for quick reply.
Created 02-22-2016 06:41 AM
Take a look at this questions, maybe it is helpful => https://community.hortonworks.com/questions/3012/what-are-the-steps-an-operator-should-take-to-repl....
Created 02-22-2016 06:50 AM
In Namenode UI check and ensure that there are no missing and corrupt blocks. If this is true then you can successfully remove failed disk from DataNode.
Refer this for details
Created 02-22-2016 07:11 AM
Thanks for quick reply.
Created 02-22-2016 09:21 AM
Can you check the setting for
dfs.datanode.failed.volumes.tolerated
in your environment. Default is 0 which is a bit restrictive. Normally 1 or even 2 ( on datanodes with high disc density ) make more operational sense.
Then your datanode will start and you can take care of the discs.
Created 02-22-2016 12:33 PM
Iif you put this machine in a separate config group and remove referencw to the directories used you can keep the machine up. Removing disk and not replacing will mean your data will be writing to OS filesystem. Also do what Benjamin siggests and increase tolerance.