Support Questions

wert_1311 · ‎04-29-2020

Hello,

Is there a way to check the replication factor of a particular folder in HDFS?

While we have default replication set to 3 in CM for some reason files being uploaded in a particular folder shows up with replication factor of 1.

Regards

Wert

GangWar · ‎04-30-2020

It was either written with less repicas by the client, or someone changed it after it was written. For example Solr Tlogs I believe are written with a replica of 1.

Each DFSClient has the ability to control the number of replicas. As said, Solr uses 1 for Tlogs, MR uses (or used to use) 10 for job files for better chance of data locality. It’s a decision made by whoever creates the client. So it is expected that any file can have a different replication factor, within the limits of dfs.namenode.replication.min and dfs.replication.max which is enforced by the NameNode.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

GangWar · ‎04-29-2020

@wert_1311 You can use the HDFS command line to ls the file.

The second column of the output will show the replication factor of the file.

For example,

$ hdfs dfs -ls  /usr/GroupStorage/data1/out.txt
-rw-r--r--   3 hadoop test 11906625598 2020-04-29 17:31 /usr/GroupStorage/data1/test

Here the replication factor is 3.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

wert_1311 · ‎04-29-2020

@GangWar

Thanks for your reply, what I am trying to zero in, is why some files (recently put) are getting created with RF 1 rather than RF 3. I have checked multiple sites but failed to get some answers, and hitting the wall. Would appreciate if there are any suggestions / pointer that I could check to fix this issue.

My Issue is as below:

/User 1/logs/User1Logs >>> files under this folder have replication factor of 1.

/User 2/logs/User2Logs >>> files under this folder have replication factor of 1

/User 3/logs/User3Logs >>> files under this folder have replication factor of 3

Regards

Wert

GangWar · ‎04-30-2020

It was either written with less repicas by the client, or someone changed it after it was written. For example Solr Tlogs I believe are written with a replica of 1.

Each DFSClient has the ability to control the number of replicas. As said, Solr uses 1 for Tlogs, MR uses (or used to use) 10 for job files for better chance of data locality. It’s a decision made by whoever creates the client. So it is expected that any file can have a different replication factor, within the limits of dfs.namenode.replication.min and dfs.replication.max which is enforced by the NameNode.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

pbaclace · ‎07-16-2024

@GangWar @wert_1311 I have found HDFS files that are persistently under-replicated, despite being over a year old. They are rare, but vulnerable to loss with one disk failure.

To be clear, this shows the replication target, not the actual:

hdfs dfs -ls filename

The actual can be found with 'hdfs fsck filename -blocks -files filename'

In theory, this situation should be transient, but I have found some cases. See example below where a file is 3 blocks in length and one of them only has one replica.

# hdfs fsck -blocks -files /tmp/part-m-03752 OUTPUT:
/tmp/part-m-03752: Under replicated BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
/tmp/part-m-03752: Replica placement policy is violated for BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792. Block should be additionally replicated on 1 more rack(s).
0. BP-955733439-1.2.3.4-1395362440665:blk_1967769089_1100461809406 len=134217728 Live_repl=3
1. BP-955733439-1.2.3.4-1395362440665:blk_1967769276_1100461809593 len=134217728 Live_repl=3
2. BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792 len=40324081 Live_repl=1

Status: HEALTHY
Total size: 308759537 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 102919845 B)
Minimally replicated blocks: 3 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (33.333332 %)
Mis-replicated blocks: 1 (33.333332 %)
Default replication factor: 3
Average block replication: 2.3333333
Corrupt blocks: 0
Missing replicas: 2 (22.222221 %)
Number of data-nodes: 30
Number of racks: 3

The filesystem under path '/tmp/part-m-03752' is HEALTHY

# hadoop fs -ls /tmp/part-m-03752 OUTPUT:
-rw-r--r-- 3 wuser hadoop 308759537 2021-12-11 16:58 /tmp/part-m-03752

[sorry, code quoting is not working for me for some reason.]

Presumably, the file was incorrectly replicated when it was written because of some failure and the defaults for dfs.client.block.write.replace-datanode-on-failure props were such that new DNs were not obtained at write time to replace ones that failed. The puzzling thing here is why does it not get re-replicated after all this time?

Cloudera Community

Support Questions

Check replication factor for a directory in hdfs