- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Tell the NameNode where to find a "MISSING" block?
- Labels:
-
HDFS
Created 06-29-2017 01:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We're in the process of decommissioning some of our older datanodes. Today after decommissioning a node, HDFS is reporting a bunch of missing blocks. Checking the HDFS, looks like the files in question are RF1; I'm assuming someone manually set them that way for some reason.
Since we're decommissioning, the actual blocks are still available in the data directories on the old node. So I happily copied one of them, and its meta file, over to an active node. They're in a different data directory, but the "subdirs" underneath "finalized" are the same. The NameNode still can't see the block, though. Is there a way for me to tell the NameNode "Hey, that block's over here now!" without actually restarting it?
I know I can probably recommission the node I took down, fix the RF on the files, and then decom it again, but these are big nodes (each holds about 2 TB of HDFS data), and decommissioning takes several hours.
Created 06-29-2017 02:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried restart the DN where you copied the blocks to?
Also, try force a full block report: hdfs dfsadmin -triggerBlockReport <datanode_host:ipc_port>
Created 06-29-2017 06:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The safe approach is to recommission the old node, change the replication factor, and then decommission it again.
Created 06-30-2017 05:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the end, HDFS actually copied the missing blocks over from the decommissioned node. It's just annoying that an fsck reports those blocks as "MISSING" when it knows where they are and that it's going to copy them eventually.
Created 06-30-2017 05:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To be fair, the Cloudera UI only reported under-replicated blocks; it never mentioned the missing blocks, and I was able to "hdfs dfs -cat" one of the files that was reporting it was corrupted. The only thing that mentioned the missing blocks was an "hdfs fsck /". I'm assuming that HDFS is aware of the decom process and will look for the blocks on the decom'ing server, but it doesn't note that in the fsck, which is pretty annoying.
Created 06-30-2017 02:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if your cluster is managed by Cloudera manager , I would use it for decommissioning rather doing it manually , it more safe and recommended.
Created 06-30-2017 05:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 06-30-2017 06:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting story.
The decomm process would not complete until all blocks have at least 1 good replica on other DNs. (good replica = replicas that are not stale and on a DataNode that is not being decommissioned or already decommissioned)
DirectoryScanner in a DataNode scans the entire directory, reconciling inconsistency between in-memory block map and on-disk replica, so it would eventually pick up the added replica, just a matter of time.