Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sandbox HDFS Replication Set to 3 - Why?

avatar

While running the latest Sandbox (HDP 2.4 on Hortonworks Sandbox), I noticed HDFS had 500+ under replicated blocks (via Ambari). Opening /etc/hadoop/conf/hdfs-site.xml, dfs.replication=3 (default http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml)

Does anyone know why the Sandbox uses a HDFS replication factor of 3, aside from the fact that its the HDFS default? I'd assume most Sandbox users are running a virtual machine representing one node. If this is the case, dfs.replication=1 in the Sandbox to prevent under replicated blocks. Is my assumption incorrect?

1 ACCEPTED SOLUTION

avatar
@rcicak

Yes you are right, it does not make sense to have a 3x replication. It is a default so it is set to 3. I have thoughts about it.

But the other way of looking at replication is if you are going after the same table and a node is busy, which does not apply in this case exactly, you can run the same query on another node where the replication is available.

I would leave it to 3, incase someone add more nodes to the VMs, the data gets replicated correctly.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

I will escalate this thank you for bringing this up.

avatar
@rcicak

Yes you are right, it does not make sense to have a 3x replication. It is a default so it is set to 3. I have thoughts about it.

But the other way of looking at replication is if you are going after the same table and a node is busy, which does not apply in this case exactly, you can run the same query on another node where the replication is available.

I would leave it to 3, incase someone add more nodes to the VMs, the data gets replicated correctly.

avatar
Expert Contributor

@Ryan Cicak The sandbox provides many of the defaults used during normal installation. You can change the 3x replication in the configs but the sandbox is mainly to allow usage of the tutorials.