Reply
Highlighted
Explorer
Posts: 14
Registered: ‎08-01-2014
Accepted Solution

UnregisteredDatanodeException on same node with same storage id

On HDFS 0.20.2, yes, it's old, 2 datanodes in our prod cluster no longer can start up. 

The namenode says:

 

2017-02-15 09:24:52,861 FATAL org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.
2017-02-15 09:24:52,862 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 9000, call register(DatanodeRegistration(cernsrchhadoop504.cernerasp.com:50010, storageID=DS-1574636665-44.128.6.253-50010-1461251397876, infoPort=50075, ipcPort=50020)) from 44.128.6.253:51326: error: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.

The kicker though, is that it's saying that datanode cernsrchhadoop504 can't serve that storage, as it's expected to be served by 44.128.6.253, which is actually cersnrchhadoop504

 

SFrom the namenode:

 

root@cernsrchhadoop388.cernerasp.com:~ ( cernsrchhadoop388.cernerasp.com )
09:28:10 $ nslookup 44.128.6.253
Server:		127.0.0.1
Address:	127.0.0.1#53

Non-authoritative answer:
253.6.128.44.in-addr.arpa	name = cernsrchhadoop504.cernerasp.com.

Datanode logs are saying similar on 504

 

 

2017-02-15 09:24:52,866 ERROR datanode.DataNode (DataNode.java:main(1372)) - org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.

So for the question, how can I get the namenode to realize that the node it is expecting to have that storage is actually the same node that's attempting to serve that storage? 

 

Explorer
Posts: 14
Registered: ‎08-01-2014

Re: UnregisteredDatanodeException on same node with same storage id

Also, to just go over what we've attempted, we've cycled the datanode (or at least attempted to), rebooted the node, and since we found HDFS-1106 where someone had the same issue, did a refresh, but still can't get it to start. 

Explorer
Posts: 14
Registered: ‎08-01-2014

Re: UnregisteredDatanodeException on same node with same storage id

Turned out that the nodes were in the excludes files, just not the host.exclude like we use in CDH5, so it was missed. 

Announcements