Reply
Explorer
Posts: 29
Registered: ‎04-08-2016
Accepted Solution

after -setrep from 3 to 2, Over-replicated blocks are not being freed after 24 hours

CDH 5.13.1
Redhat 6.9

We wish to change the number of replications from the default of 3 copies to 2 on one particular folder in hdfs.

After running this on one cluster:

$ hdfs dfs -setrep -R 2 /backups

and then doing a

$ hdfs dfs -du /

we saw that it freed the blocks very quickly and the output of fsck shows no "Over-replicated blocks":

Status: HEALTHY
 Total size:    149514016589 B
 Total dirs:    27440
 Total files:    128746
 Total symlinks:        0
 Total blocks (validated):    126355 (avg. block size 1183285 B)
 Minimally replicated blocks:    126355 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.3367577
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1


However on a bigger test system did the same command and even a day later still no change.

$ hdfs fsck /

shows "Over-replicated blocks"

Status: HEALTHY
 Total size:    56614841380 B
 Total dirs:    7222
 Total files:    113731
 Total symlinks:        0
 Total blocks (validated):    110143 (avg. block size 514012 B)
 Minimally replicated blocks:    110143 (100.0 %)
 Over-replicated blocks:    37439 (33.991264 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.9921465
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        8
 Number of racks:        1

The number of Over-replicated blocks has reduced slightly and seems stuck at 37439.

I've manually restarted each datanode service, and later restarted the entire cluster.

Still stuck at 37439.

I found this comment from Jarsh J:

|Then monitor the over-replicated blocks in Cloudera Manager via the below chart tsquery:
|
|SELECT excess_blocks WHERE roleType = NAMENODE
|
|This should show a spike and then begin a slow but steady drop back to zero over time, which you can monitor.


but when I run this query it reports "excess_blocks" is 0.


$ hdfs dfs -du /
22987202359  69376013863  /backups


shows 3 copies still.

How to get this data space cleared?

Rebalance did nothing.

thanks.

Posts: 1,664
Kudos: 325
Solutions: 262
Registered: ‎07-31-2013

Re: after -setrep from 3 to 2, Over-replicated blocks are not being freed after 24 hours

Do you perchance have any snapshots held from before the 'hdfs dfs -setrep 2' command was executed, under the target path (/backups)?

If you do have a snapshot, and the over replicated count is still stuck, this behaviour can be explained, because replication factor is a file based attribute and the older snapshot references the higher replication factor, disallowing the deletion of the now-excess block.

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/
Explorer
Posts: 29
Registered: ‎04-08-2016

Re: after -setrep from 3 to 2, Over-replicated blocks are not being freed after 24 hours

Yes!

 

There was a snapshot.

 

Thank you!

Announcements