Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

Folks often ask about best practice for setting replication factor, evidently wondering if the default value of 3 is supported by factual data. The cool answer is, yes it is!

Rob Chansler, an excellent engineering manager and contributor to Hadoop at Yahoo for several years, posted the best material in 2011. The hard-core math is in a spreadsheet attached to Apache Jira https://issues.apache.org/jira/browse/HDFS-2535, "A Model for Data Durability", where he uses reasonable assumptions and experience from Yahoo to calculate the probable rate of data loss events at a single site due to node failures when replication is set to 3, as 0.021 events per century. See "Attachments" : "LosingBlocks.xlsx" in the Jira ticket.

Rob also published an article in Usenix titled Data Availability and Durability with the Hadoop Distributed File System, and did a related presentation at Hadoop Summit 2011, Data Integrity and Availability of HDFS (video).

Sanjay Radia summarized this information in Hortonworks blog Data Integrity and Availability in Apache Hadoop HDFS, but that article did not focus on the replication factor specifically.

3,105 Views
Comments
New Contributor

e know Hadoop is used in clustered environment where we have clusters, each cluster will have multiple racks, each rack will have multiple datanodes.

So to make HDFS fault tolerant in your cluster you need to consider following failures-

  1. DataNode failure
  2. Rack failure

Chances of Cluster failure is fairly low so let not think about it. In the above cases you need to make sure that -

  1. If one DataNode fails, you can get the same data from another DataNode
  2. If the entire Rack fails, you can get the same data from another Rack

So thats why I think default replication factor is set to 3, so that not 2 replica goes to same DataNode and at-least 1 replica goes to different Rack to fulfill the above mentioned Fault-Tolerant criteria. Hope this will help.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎02-13-2016 06:06 AM
Updated by:
 
Contributors
Top Kudoed Authors