Created 06-29-2017 01:32 PM
We're in the process of decommissioning some of our older datanodes. Today after decommissioning a node, HDFS is reporting a bunch of missing blocks. Checking the HDFS, looks like the files in question are RF1; I'm assuming someone manually set them that way for some reason.
Since we're decommissioning, the actual blocks are still available in the data directories on the old node. So I happily copied one of them, and its meta file, over to an active node. They're in a different data directory, but the "subdirs" underneath "finalized" are the same. The NameNode still can't see the block, though. Is there a way for me to tell the NameNode "Hey, that block's over here now!" without actually restarting it?
I know I can probably recommission the node I took down, fix the RF on the files, and then decom it again, but these are big nodes (each holds about 2 TB of HDFS data), and decommissioning takes several hours.
Created 06-29-2017 02:51 PM
Have you tried restart the DN where you copied the blocks to?
Also, try force a full block report: hdfs dfsadmin -triggerBlockReport <datanode_host:ipc_port>
Created 06-29-2017 06:03 PM
Created 06-30-2017 05:08 AM
Created 06-30-2017 05:06 AM
Created 06-30-2017 02:01 AM
if your cluster is managed by Cloudera manager , I would use it for decommissioning rather doing it manually , it more safe and recommended.
Created 06-30-2017 05:08 AM
Created 06-30-2017 06:34 AM
Interesting story.
The decomm process would not complete until all blocks have at least 1 good replica on other DNs. (good replica = replicas that are not stale and on a DataNode that is not being decommissioned or already decommissioned)
DirectoryScanner in a DataNode scans the entire directory, reconciling inconsistency between in-memory block map and on-disk replica, so it would eventually pick up the added replica, just a matter of time.