There is a problem with entering a host to the maintenance state using cloudera manager. The problem is related to replication of some blocks. The most files in hdfs have a factor replication of 3, but a few of them have changed at 2. When maintenance is starting the first of the blocks with replication factor of 2 trying to replicate to another DataNode and can't do that. The value dfs.namenode.maintenance.replication.min is 1. In the NameNode log file there is a message (it repeats):
INFO BlockStateChange: Block: blk_2355920052_1283317658, Expected Replicas: 2, live replicas: 0, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 0, maintenance replicas: 1, live entering maintenance replicas: 1, excess replicas: 1, Is Open File: false, Datanodes having this block: ... ... , Current Datanode: ..., Is current datanode decommissioning: false, Is current datanode entering maintenance: true
and simultaneously the fsck command by that block (hdfs fsck / -blockId blk_2355920052) shows:
No. of Expected Replicas: 2
No. of live Replica: 1
No. of excess Replica: 0
No. of stale Replica: 0
No. of decommissioned Replica: 0
No. of decommissioning Replica: 0
No. of corrupt Replica: 0
Block replica on dnanode ... is HEALTHY
Block replica on dnanode ... is HEALTHY
On the one hand NameNode shows that live replicas: 0 and excess replicas: 1 and on the other hand the fsck command shows that live Replica: 1 and excess Replica: 0.
Why does the problem is occur?
Does anyone know how to deal with this problem?
Thank you in advance!
Cloudera manager shows this messages: