Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

after -setrep from 3 to 2, Over-replicated blocks are not being freed after 24 hours

avatar
Contributor

CDH 5.13.1
Redhat 6.9

We wish to change the number of replications from the default of 3 copies to 2 on one particular folder in hdfs.

After running this on one cluster:

$ hdfs dfs -setrep -R 2 /backups

and then doing a

$ hdfs dfs -du /

we saw that it freed the blocks very quickly and the output of fsck shows no "Over-replicated blocks":

Status: HEALTHY
 Total size:    149514016589 B
 Total dirs:    27440
 Total files:    128746
 Total symlinks:        0
 Total blocks (validated):    126355 (avg. block size 1183285 B)
 Minimally replicated blocks:    126355 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.3367577
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1


However on a bigger test system did the same command and even a day later still no change.

$ hdfs fsck /

shows "Over-replicated blocks"

Status: HEALTHY
 Total size:    56614841380 B
 Total dirs:    7222
 Total files:    113731
 Total symlinks:        0
 Total blocks (validated):    110143 (avg. block size 514012 B)
 Minimally replicated blocks:    110143 (100.0 %)
 Over-replicated blocks:    37439 (33.991264 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.9921465
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        8
 Number of racks:        1

The number of Over-replicated blocks has reduced slightly and seems stuck at 37439.

I've manually restarted each datanode service, and later restarted the entire cluster.

Still stuck at 37439.

I found this comment from Jarsh J:

|Then monitor the over-replicated blocks in Cloudera Manager via the below chart tsquery:
|
|SELECT excess_blocks WHERE roleType = NAMENODE
|
|This should show a spike and then begin a slow but steady drop back to zero over time, which you can monitor.


but when I run this query it reports "excess_blocks" is 0.


$ hdfs dfs -du /
22987202359  69376013863  /backups


shows 3 copies still.

How to get this data space cleared?

Rebalance did nothing.

thanks.

1 ACCEPTED SOLUTION

avatar
Mentor
Do you perchance have any snapshots held from before the 'hdfs dfs -setrep 2' command was executed, under the target path (/backups)?

If you do have a snapshot, and the over replicated count is still stuck, this behaviour can be explained, because replication factor is a file based attribute and the older snapshot references the higher replication factor, disallowing the deletion of the now-excess block.

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/

View solution in original post

5 REPLIES 5

avatar
Mentor
Do you perchance have any snapshots held from before the 'hdfs dfs -setrep 2' command was executed, under the target path (/backups)?

If you do have a snapshot, and the over replicated count is still stuck, this behaviour can be explained, because replication factor is a file based attribute and the older snapshot references the higher replication factor, disallowing the deletion of the now-excess block.

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/

avatar
Contributor

Yes!

 

There was a snapshot.

 

Thank you!

avatar
Explorer

@Harsh J

@ScottChris

 

Could you kindly explain this in a bit detail.

 

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/

 

After this step what we need to do. Do we need to delete older snapshot (which was created when rep is 3) and create a new snapshot at this time when rep is 2.

 

Thanks in advance.

avatar
Mentor
> Do we need to delete older snapshot (which was created when rep is 3) and
create a new snapshot at this time when rep is 2.

Yes, that is correct.

avatar
Explorer

@Harsh J

 

Thanks for reply.