Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

mass removal of datanodes impact!

Solved Go to solution
Highlighted

mass removal of datanodes impact!

New Contributor

Hi All,

what is the impact on existing cluster if we remove 50 of 200 datanodes from existing cluster . space is not an issue as the system is 30% of the hdfs filesystem usage . how long does it takes to rebalance the complete cluster ?

Regards

Srinivas S

1 ACCEPTED SOLUTION

Accepted Solutions

Re: mass removal of datanodes impact!

Don't just remove the DataNodes. Even with rack awareness, removing >2 nodes from different racks will lead to data loss. Instead, you should decommission them first as described here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_administration/content/ref-a179736c-eb7c...

You may know this already, but I want to make it clear for others who read this discussion in the future.

3 REPLIES 3

Re: mass removal of datanodes impact!

Guru

Hi @srinivas s if you're using rack awareness you should probably get rid of 50 datanodes by decomissioning them without loosing some blocks, otherwise you'll probably will.

rebalance time depends on your network and cluster utilization, you can adjust some parameters to make it fastest if necessary, basically

hdfs dfsadmin -setBalancerBandwidth <bandwidth (kb/s)>

or within your HDFS params (example) :

dfs.balance.bandwidthPerSec=100000000
dfs.datanode.max.transfer.threads=16384
dfs.datanode.balance.max.concurrent.moves=500 

please check Accept if you're satisfied with my answer

Re: mass removal of datanodes impact!

@Laurent Edel this answer is incorrect. Please consider editing it to mention decommissioning. Else someone may assume it's OK to just remove nodes if they have rack awareness.

Re: mass removal of datanodes impact!

Don't just remove the DataNodes. Even with rack awareness, removing >2 nodes from different racks will lead to data loss. Instead, you should decommission them first as described here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_administration/content/ref-a179736c-eb7c...

You may know this already, but I want to make it clear for others who read this discussion in the future.

Don't have an account?
Coming from Hortonworks? Activate your account here