Support Questions

donigrubbs · ‎02-15-2017

On HDFS 0.20.2, yes, it's old, 2 datanodes in our prod cluster no longer can start up.

The namenode says:

2017-02-15 09:24:52,861 FATAL org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.
2017-02-15 09:24:52,862 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 9000, call register(DatanodeRegistration(cernsrchhadoop504.cernerasp.com:50010, storageID=DS-1574636665-44.128.6.253-50010-1461251397876, infoPort=50075, ipcPort=50020)) from 44.128.6.253:51326: error: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.

The kicker though, is that it's saying that datanode cernsrchhadoop504 can't serve that storage, as it's expected to be served by 44.128.6.253, which is actually cersnrchhadoop504

SFrom the namenode:

root@cernsrchhadoop388.cernerasp.com:~ ( cernsrchhadoop388.cernerasp.com )
09:28:10 $ nslookup 44.128.6.253
Server:		127.0.0.1
Address:	127.0.0.1#53

Non-authoritative answer:
253.6.128.44.in-addr.arpa	name = cernsrchhadoop504.cernerasp.com.

Datanode logs are saying similar on 504

2017-02-15 09:24:52,866 ERROR datanode.DataNode (DataNode.java:main(1372)) - org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.

So for the question, how can I get the namenode to realize that the node it is expecting to have that storage is actually the same node that's attempting to serve that storage?

donigrubbs · ‎02-15-2017

Turned out that the nodes were in the excludes files, just not the host.exclude like we use in CDH5, so it was missed.

View solution in original post

donigrubbs · ‎02-15-2017

Also, to just go over what we've attempted, we've cycled the datanode (or at least attempted to), rebooted the node, and since we found HDFS-1106 where someone had the same issue, did a refresh, but still can't get it to start.

donigrubbs · ‎02-15-2017

Turned out that the nodes were in the excludes files, just not the host.exclude like we use in CDH5, so it was missed.

Cloudera Community

Support Questions

UnregisteredDatanodeException on same node with same storage id