Reply
New Contributor
Posts: 3
Registered: ‎01-07-2016

Namenode still has references to blocks on decommissioned datanode, can't recommission

I'm using CDH5.3.2 (Hadoop 2.5.0) and a datanode failed and I decommissioned it. The namenode still has references to blocks on the node though (which do not correspond to files) and that is causing issues when I try to recommission the datanode. From hdfs -metasave output, the active namenode says there are blocks to be invalidated on that node, but that is not the case on the standby namenode. From an fsimage dump, I can see with the offline image viewer that there are inode references for that node but they have no other references. 

 

How would I go about resolving this situation? Is it save for to restart or failover the namenode in this state?

 

Thanks and have a great day!

Posts: 1,491
Kudos: 246
Solutions: 226
Registered: ‎07-31-2013

Re: Namenode still has references to blocks on decommissioned datanode, can't recommission

The NameNode metadata does not save block locations. When a DN starts, be
it a recommissioned one, a restarted one or a new one, it sends a full
block report carrying all the block IDs found on its local disks at time of
startup.

The NameNode then computes if these replicas must continue to be present,
or are now existing elsewhere/have no file references and should be deleted
(invalidated).

>From my reading of your message, it appears you've gotten concerned with
the invalidation messages pertaining to non-existent (as current files)
block IDs when you recommissioned a DN. If this is correct, then there's
nothing to be worried about as the operation is normal.

When your DN is off-cluster, and some files get removed away, those block
IDs are removed from NN metadata too. When the DN comes back later, it
still carries those old blocks on its disks unless the disk was wiped, and
these make it back to the NameNode as part of the block report. But the
NameNode safely asks the DN to remove these blocks away after it realises
they do not belong to any currently known file.

Please let me know if I've misunderstood your question, and share
appropriate logs that appear to be sticking out oddly in your experience.
Backline Customer Operations Engineer
New Contributor
Posts: 3
Registered: ‎01-07-2016

Re: Namenode still has references to blocks on decommissioned datanode, can't recommission

Thanks very much for your reply. What is sticking out as odd is that in the fsck output, there are files that are indicated as under replicated, and the block info for those files says they are on the decommissioned node. There are no under replicated blocks or corrupt blocks indicated in the dfsadmin report output. Related to recommissioning the node, I get an error as follows (with hostnames/IPs removed)

 

2016-01-07 20:18:29,807 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-1321958480-Z-1396651111514 (Datanode Uuid 4ed95789-d007-4383-8d94-c2bd7c347b58) service to X/Y:8022 java.lang.ArrayIndexOutOfBoundsException

 

Thanks!

New Contributor
Posts: 3
Registered: ‎01-07-2016

Re: Namenode still has references to blocks on decommissioned datanode, can't recommission

Just an update on this, I restarted the NameNode this morning and it immediately detected the blocks that were under replicated and resolved them. I was also able to successfully recommission the affected node.

Explorer
Posts: 6
Registered: ‎05-09-2017

Re: Namenode still has references to blocks on decommissioned datanode, can't recommission

[ Edited ]

Had another query on the same topic, if a data node goes down in the middle of the night the blocks on that node are replicated to the other data nodes in the cluster to maintain a replication factor of 3.

 

What would happen if the datanode is added say after the following durations ?

 

10 days -

20 days -

1 month - 

3 months - 

6 months - 

 

From what i understand, the NameNode then computes if these replicas must continue to be present,
or are now existing elsewhere/have no file references and should be deleted
(invalidated). Is this true for all the above cases ?

 

Would there be data orphaned or get into non-hdfs space usage if a datanode is added after say 3 months or more ? 

 

what would be the best practices in such scenario ?

 

- Decommission the datanode that went down 

- Complete maintainence tasks on the DN

- Recommission the node back 

 

or just add the node back to the cluster ?

Posts: 1,491
Kudos: 246
Solutions: 226
Registered: ‎07-31-2013

Re: Namenode still has references to blocks on decommissioned datanode, can't recommission

Its worth starting a new topic for independent questions (this topic was pertaining to an older HDFS bug on 5.3.x that caused ArrayIndexOutOfBoundsException).

To answer your question though, unless you are planning to repurpose a DN host from one logical cluster to another, you do not have to worry about its older carried contents. The NameNode auto-detects an over-replication or invalidation of stale blocks that report back into the cluster upon recommission and issues deletes while maintaining placement policies. At worst case, you may need to run the HDFS balancer after the recommission to bring back the replaced DN into average utilization.

And yes, this occurs regardless of how long the DN has been out of action.
Backline Customer Operations Engineer
Announcements