Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Cloudera Employee

How to delete Data Dir from the DNs

1. Open the Namenode UI and check the status. It should be healthy i.e. no missing, corrupt or under-replicated blocks

 

2. Go to Ambari—Services—HDFS—Configs and change the Datanode directories (remove the datadir which is not required) and save the change

Note: Please just make the config change and DON’T delete the directories or any files yet.


3. IMP: We need to be very careful here NOT to restart all the services as Ambari will ask to “Restart Required”, otherwise we

will see the missing blocks in NN and have to revert the changes back.

4.Go to any one of the DN and restart the DataNode service

5. Login to the same DN from putty and run the block pool report. Run the below command for same

hdfs dfsadmin -triggerBlockReport <datanode_host:ipc_port>

You can get the datanode ipc port here

Ambari---Services---HDFS---Configs

and search for dfs.datanode.ipc.address

Here is the sample command output ran on the datanode

[hdfs@xxxx ~]$ hdfs dfsadmin -triggerBlockReport xxxx.openstacklocal:8010Triggering a full block report on xxxx.openstacklocal:8010.

6. Open the NN UI again and now will see Under Replicated blocks

Which will be decreasing as the data gets replicated

7. We need to wait until the Under replicated blocks turn to 0.

8. Once the Under Replicated block becomes 0, we need to iterate Step1-7 for Second DN and so on for all the DNs.

9. Once all the DNs started and the NN UI is back to healthy state (i.e. no missing, corrupt or under-replicated blocks), It is safe to re-start the NN

10. Verify the NN UI again to double check the health status of NN after restart.

11. If all good, it’s safe to restart other services which require restart.

2,090 Views