Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

after -setrep from 3 to 2, Over-replicated blocks are not being freed after 24 hours

avatar
Contributor

CDH 5.13.1
Redhat 6.9

We wish to change the number of replications from the default of 3 copies to 2 on one particular folder in hdfs.

After running this on one cluster:

$ hdfs dfs -setrep -R 2 /backups

and then doing a

$ hdfs dfs -du /

we saw that it freed the blocks very quickly and the output of fsck shows no "Over-replicated blocks":

Status: HEALTHY
 Total size:    149514016589 B
 Total dirs:    27440
 Total files:    128746
 Total symlinks:        0
 Total blocks (validated):    126355 (avg. block size 1183285 B)
 Minimally replicated blocks:    126355 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.3367577
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1


However on a bigger test system did the same command and even a day later still no change.

$ hdfs fsck /

shows "Over-replicated blocks"

Status: HEALTHY
 Total size:    56614841380 B
 Total dirs:    7222
 Total files:    113731
 Total symlinks:        0
 Total blocks (validated):    110143 (avg. block size 514012 B)
 Minimally replicated blocks:    110143 (100.0 %)
 Over-replicated blocks:    37439 (33.991264 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.9921465
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        8
 Number of racks:        1

The number of Over-replicated blocks has reduced slightly and seems stuck at 37439.

I've manually restarted each datanode service, and later restarted the entire cluster.

Still stuck at 37439.

I found this comment from Jarsh J:

|Then monitor the over-replicated blocks in Cloudera Manager via the below chart tsquery:
|
|SELECT excess_blocks WHERE roleType = NAMENODE
|
|This should show a spike and then begin a slow but steady drop back to zero over time, which you can monitor.


but when I run this query it reports "excess_blocks" is 0.


$ hdfs dfs -du /
22987202359  69376013863  /backups


shows 3 copies still.

How to get this data space cleared?

Rebalance did nothing.

thanks.

1 ACCEPTED SOLUTION

avatar
Mentor
Do you perchance have any snapshots held from before the 'hdfs dfs -setrep 2' command was executed, under the target path (/backups)?

If you do have a snapshot, and the over replicated count is still stuck, this behaviour can be explained, because replication factor is a file based attribute and the older snapshot references the higher replication factor, disallowing the deletion of the now-excess block.

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/

View solution in original post

5 REPLIES 5

avatar
Mentor
Do you perchance have any snapshots held from before the 'hdfs dfs -setrep 2' command was executed, under the target path (/backups)?

If you do have a snapshot, and the over replicated count is still stuck, this behaviour can be explained, because replication factor is a file based attribute and the older snapshot references the higher replication factor, disallowing the deletion of the now-excess block.

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/

avatar
Contributor

Yes!

 

There was a snapshot.

 

Thank you!

avatar
Explorer

@Harsh J

@ScottChris

 

Could you kindly explain this in a bit detail.

 

You can run the below to discover existing snapshots, as the 'hdfs' user (or equivalent superuser):

~> hdfs lsSnapshottableDir
~> # For every directory printed above as $DIR:
~> hdfs dfs -ls $DIR/.snapshot/

 

After this step what we need to do. Do we need to delete older snapshot (which was created when rep is 3) and create a new snapshot at this time when rep is 2.

 

Thanks in advance.

avatar
Mentor
> Do we need to delete older snapshot (which was created when rep is 3) and
create a new snapshot at this time when rep is 2.

Yes, that is correct.

avatar
Explorer

@Harsh J

 

Thanks for reply.