Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Check replication factor for a directory in hdfs

avatar
Expert Contributor

Hello,

Is there a way to check the replication factor of a particular folder in HDFS?

While we have default replication set to 3 in CM for some reason files being uploaded in a particular folder shows up with replication factor of 1.

 

Regards

Wert

1 ACCEPTED SOLUTION

avatar
Master Guru
It was either written with less repicas by the client, or someone changed it after it was written. For example Solr Tlogs I believe are written with a replica of 1. 
Each DFSClient has the ability to control the number of replicas.  As said, Solr uses 1 for Tlogs, MR uses (or used to use) 10 for job files for better chance of data locality.  It’s a decision made by whoever creates the client.  So it is expected that any file can have a different replication factor, within the limits of dfs.namenode.replication.min and dfs.replication.max which is enforced by the NameNode.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

3 REPLIES 3

avatar
Master Guru

@wert_1311 You can use the HDFS command line to ls the file.

The second column of the output will show the replication factor of the file.

For example,

$ hdfs dfs -ls  /usr/GroupStorage/data1/out.txt
-rw-r--r--   3 hadoop test 11906625598 2020-04-29 17:31 /usr/GroupStorage/data1/test 

 Here the replication factor is 3. 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Expert Contributor

@GangWar 

Thanks for your reply, what I am trying to zero in, is why some files (recently put) are getting created with RF 1 rather than RF 3. I have checked multiple sites but failed to get some answers, and hitting the wall. Would appreciate if there are any suggestions / pointer that I could check to fix this issue.

 

My Issue is as below:

/User 1/logs/User1Logs >>> files under this folder have replication factor of 1.

/User 2/logs/User2Logs >>> files under this folder have replication factor of 1

/User 3/logs/User3Logs >>> files under this folder have replication factor of 3

 

Regards

Wert

 

avatar
Master Guru
It was either written with less repicas by the client, or someone changed it after it was written. For example Solr Tlogs I believe are written with a replica of 1. 
Each DFSClient has the ability to control the number of replicas.  As said, Solr uses 1 for Tlogs, MR uses (or used to use) 10 for job files for better chance of data locality.  It’s a decision made by whoever creates the client.  So it is expected that any file can have a different replication factor, within the limits of dfs.namenode.replication.min and dfs.replication.max which is enforced by the NameNode.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.