Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Fixing Over-replicated Blocks

avatar
Expert Contributor

CentOS 6.6
CDH 5.1.2

 

Due to space pressure, I need to reduce replication factor of existing files from 3 to 2.

 

A command like the following is executed

[hdfs]$ hdfs dfs -setrep -R -w 2  /path/of/files

A warning about "waiting time may be long for DECREASING the number of replications" appeared.

 

I am still waiting after tens of minutes. And fsck still showing over-replication. 

 

[hdfs]$ hdfs fsck /path/of/files
16/11/02 12:04:42 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded
Connecting to namenode via http://namenode1:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.88.38 for path /path/of/files at Wed Nov 02 12:04:43 HKT 2016
....Status: HEALTHY
 Total size:	129643323 B
 Total dirs:	1
 Total files:	4
 Total symlinks:		0
 Total blocks (validated):	4 (avg. block size 32410830 B)
 Minimally replicated blocks:	4 (100.0 %)
 Over-replicated blocks:	4 (75.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		6
 Number of racks:		1
FSCK ended at Wed Nov 02 12:04:43 HKT 2016 in 1 milliseconds


The filesystem under path ' /path/of/files' is HEALTHY

 

Is this normal? How long should the wait be?

 

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Answering my question...

 

The source code of org.apache.hadoop.hdfs.server.blockmanagement.BlockManager says

 

 

...
    if (numCurrentReplica > expectedReplication) {
      if (num.replicasOnStaleNodes() > 0) {
        // If any of the replicas of this block are on nodes that are
        // considered "stale", then these replicas may in fact have
        // already been deleted. So, we cannot safely act on the
        // over-replication until a later point in time, when
        // the "stale" nodes have block reported.
        return MisReplicationResult.POSTPONE;
      }
...

 

So the key point is whether the DataNodes are "stale". I don't know how to force the nodes to have block reported besides restarting. So I restarted all DataNode and over-replicated blocks gone.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

The setrep command just completed. However, the fsck still showing over-replication.

avatar
Expert Contributor

Answering my question...

 

The source code of org.apache.hadoop.hdfs.server.blockmanagement.BlockManager says

 

 

...
    if (numCurrentReplica > expectedReplication) {
      if (num.replicasOnStaleNodes() > 0) {
        // If any of the replicas of this block are on nodes that are
        // considered "stale", then these replicas may in fact have
        // already been deleted. So, we cannot safely act on the
        // over-replication until a later point in time, when
        // the "stale" nodes have block reported.
        return MisReplicationResult.POSTPONE;
      }
...

 

So the key point is whether the DataNodes are "stale". I don't know how to force the nodes to have block reported besides restarting. So I restarted all DataNode and over-replicated blocks gone.