Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?
Created 02-22-2021 01:08 PM
Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?
Created 02-24-2021 12:30 AM
@guido I think so. Check the doc if that helps:
https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_mc_host_maint.html#recomm_host
Created 02-24-2021 01:14 AM
Thanks for your answer.
I cannot see anything on documentations that clarify if I need to delete all data.
And I'm not using Cloudera Manager but Ambari (it's Hortonworks 2.6.5 hadoop 2.7.3).
For what I understand deleting all data would be better to balance in node disks (I had to change a faulty disk). As I'm stuck on hadoop 2.7.3 there is no internal balance facility.
Created 04-12-2021 11:52 PM
Hi @JGUI , There is no requirement for deleting the data from the datanode that is going to be decommissioned. Once the DN is been decommissioned all the blocks in the DN would be replicated to a different DN.
And is there any error that you are encountering while you are decommissioning ?
Typically, HDFS would self-heal and would re-replicate the under-replicated blocks that are due to the DN that is been decommissioned. And NN would start replicating the blocks with the other two replication that is present in HDFS.