Created 04-29-2020 02:49 AM
Hello,
Is there a way to check the replication factor of a particular folder in HDFS?
While we have default replication set to 3 in CM for some reason files being uploaded in a particular folder shows up with replication factor of 1.
Regards
Wert
Created on 04-30-2020 10:31 AM - edited 04-30-2020 10:33 AM
Created on 04-29-2020 05:14 AM - edited 04-29-2020 05:16 AM
@wert_1311 You can use the HDFS command line to ls the file.
The second column of the output will show the replication factor of the file.
For example,
$ hdfs dfs -ls /usr/GroupStorage/data1/out.txt
-rw-r--r-- 3 hadoop test 11906625598 2020-04-29 17:31 /usr/GroupStorage/data1/test
Here the replication factor is 3.
Created 04-29-2020 11:35 PM
Thanks for your reply, what I am trying to zero in, is why some files (recently put) are getting created with RF 1 rather than RF 3. I have checked multiple sites but failed to get some answers, and hitting the wall. Would appreciate if there are any suggestions / pointer that I could check to fix this issue.
My Issue is as below:
/User 1/logs/User1Logs >>> files under this folder have replication factor of 1.
/User 2/logs/User2Logs >>> files under this folder have replication factor of 1
/User 3/logs/User3Logs >>> files under this folder have replication factor of 3
Regards
Wert
Created on 04-30-2020 10:31 AM - edited 04-30-2020 10:33 AM
Created 07-16-2024 03:25 PM
@GangWar @wert_1311 I have found HDFS files that are persistently under-replicated, despite being over a year old. They are rare, but vulnerable to loss with one disk failure.
To be clear, this shows the replication target, not the actual:
hdfs dfs -ls filename
The actual can be found with 'hdfs fsck filename -blocks -files filename'
In theory, this situation should be transient, but I have found some cases. See example below where a file is 3 blocks in length and one of them only has one replica.
# hdfs fsck -blocks -files /tmp/part-m-03752 OUTPUT:
/tmp/part-m-03752: Under replicated BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
/tmp/part-m-03752: Replica placement policy is violated for BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792. Block should be additionally replicated on 1 more rack(s).
0. BP-955733439-1.2.3.4-1395362440665:blk_1967769089_1100461809406 len=134217728 Live_repl=3
1. BP-955733439-1.2.3.4-1395362440665:blk_1967769276_1100461809593 len=134217728 Live_repl=3
2. BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792 len=40324081 Live_repl=1
Status: HEALTHY
Total size: 308759537 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 102919845 B)
Minimally replicated blocks: 3 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (33.333332 %)
Mis-replicated blocks: 1 (33.333332 %)
Default replication factor: 3
Average block replication: 2.3333333
Corrupt blocks: 0
Missing replicas: 2 (22.222221 %)
Number of data-nodes: 30
Number of racks: 3
The filesystem under path '/tmp/part-m-03752' is HEALTHY
# hadoop fs -ls /tmp/part-m-03752 OUTPUT:
-rw-r--r-- 3 wuser hadoop 308759537 2021-12-11 16:58 /tmp/part-m-03752
[sorry, code quoting is not working for me for some reason.]
Presumably, the file was incorrectly replicated when it was written because of some failure and the defaults for dfs.client.block.write.replace-datanode-on-failure props were such that new DNs were not obtained at write time to replace ones that failed. The puzzling thing here is why does it not get re-replicated after all this time?