Support Questions

Find answers, ask questions, and share your expertise

How to remove risk disks from Hadoop cluster ?

avatar
Rising Star

Some disks failed in my HDFS cluster. and nodemanager cannot start in these nodes.

2308-ll.jpg

2309-nm.jpg

How to fix it ?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

In Namenode UI check and ensure that there are no missing and corrupt blocks. If this is true then you can successfully remove failed disk from DataNode.

Refer this for details

View solution in original post

7 REPLIES 7

avatar

Is multiple disks on the same node?

if yes, I think you can decommission the node from Ambari->hosts->data node -> you will find decommission from drop down.

avatar
Rising Star

Thanks for quick reply.

avatar

avatar
Super Collaborator

In Namenode UI check and ensure that there are no missing and corrupt blocks. If this is true then you can successfully remove failed disk from DataNode.

Refer this for details

avatar
Rising Star

Thanks for quick reply.

avatar
Master Guru

Can you check the setting for

dfs.datanode.failed.volumes.tolerated

in your environment. Default is 0 which is a bit restrictive. Normally 1 or even 2 ( on datanodes with high disc density ) make more operational sense.

Then your datanode will start and you can take care of the discs.

avatar
Master Mentor

Iif you put this machine in a separate config group and remove referencw to the directories used you can keep the machine up. Removing disk and not replacing will mean your data will be writing to OS filesystem. Also do what Benjamin siggests and increase tolerance.