01-07-2016 02:30 PM
I'm using CDH5.3.2 (Hadoop 2.5.0) and a datanode failed and I decommissioned it. The namenode still has references to blocks on the node though (which do not correspond to files) and that is causing issues when I try to recommission the datanode. From hdfs -metasave output, the active namenode says there are blocks to be invalidated on that node, but that is not the case on the standby namenode. From an fsimage dump, I can see with the offline image viewer that there are inode references for that node but they have no other references.
How would I go about resolving this situation? Is it save for to restart or failover the namenode in this state?
Thanks and have a great day!
01-08-2016 12:52 AM
01-08-2016 06:49 AM
Thanks very much for your reply. What is sticking out as odd is that in the fsck output, there are files that are indicated as under replicated, and the block info for those files says they are on the decommissioned node. There are no under replicated blocks or corrupt blocks indicated in the dfsadmin report output. Related to recommissioning the node, I get an error as follows (with hostnames/IPs removed)
2016-01-07 20:18:29,807 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-1321958480-Z-1396651111514 (Datanode Uuid 4ed95789-d007-4383-8d94-c2bd7c347b58) service to X/Y:8022 java.lang.ArrayIndexOutOfBoundsException
01-08-2016 01:48 PM
Just an update on this, I restarted the NameNode this morning and it immediately detected the blocks that were under replicated and resolved them. I was also able to successfully recommission the affected node.
05-09-2017 08:07 PM - edited 05-09-2017 08:25 PM
Had another query on the same topic, if a data node goes down in the middle of the night the blocks on that node are replicated to the other data nodes in the cluster to maintain a replication factor of 3.
What would happen if the datanode is added say after the following durations ?
10 days -
20 days -
1 month -
3 months -
6 months -
From what i understand, the NameNode then computes if these replicas must continue to be present,
or are now existing elsewhere/have no file references and should be deleted
(invalidated). Is this true for all the above cases ?
Would there be data orphaned or get into non-hdfs space usage if a datanode is added after say 3 months or more ?
what would be the best practices in such scenario ?
- Decommission the datanode that went down
- Complete maintainence tasks on the DN
- Recommission the node back
or just add the node back to the cluster ?
05-09-2017 09:56 PM