DatanodeRegistration(172.31.4.192, datanodeUuid=5d7a5533-df53-454e-bfb3-2dfcdbfb7b1b, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=cluster2;nsid=725965767;c=0):Got exception while serving BP-1423177047-172.31.4.192-1492091038346:blk_1118810958_45073263 to /172.31.10.74:44406 org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1423177047-172.31.4.192-1492091038346:blk_1118810958_45073263 at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:466) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:241) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745)
Although my datanodes are working fine I see this error in Diagnostics. Could this be a problem?
Created 05-21-2018 03:42 AM
I hope you have more than 3 data nodes
Generally there two types of "data missing" issues are possible for many reasons
a. ReplicaNotFoundException
b. BlockMissingException
If your issue is related to BlockMissingException and if you have backup data in your DR environment then you are good otherwise it might be a problem, but for ReplicaNotFoundException, please make sure all your datanodes are healthy and commissioned state. In fact, namenode suppose to handle this automatically whenever a hit occurs on that data.. if not, you can also try hdfs rebalance (or) NN restart may fix this issue, but you don't need to try this option unless some user report any issue on the particular data. In your case no one reported yet and you found it, so you can ignore it for now
Created 05-21-2018 03:42 AM
I hope you have more than 3 data nodes
Generally there two types of "data missing" issues are possible for many reasons
a. ReplicaNotFoundException
b. BlockMissingException
If your issue is related to BlockMissingException and if you have backup data in your DR environment then you are good otherwise it might be a problem, but for ReplicaNotFoundException, please make sure all your datanodes are healthy and commissioned state. In fact, namenode suppose to handle this automatically whenever a hit occurs on that data.. if not, you can also try hdfs rebalance (or) NN restart may fix this issue, but you don't need to try this option unless some user report any issue on the particular data. In your case no one reported yet and you found it, so you can ignore it for now