Support Questions

Find answers, ask questions, and share your expertise

Is it possible to define a replication plan among datanodes?

Rising Star

Hi,

To clarify the question I will illustrate the case;

Lets name datanodes like; [dnode1, dnode2, dnode3, dnode4, dnode5, dnode6, dnode7, dnode8, dnode9]

I don't want one block to be replicated among dnode1, dnode2 and dnode3 because I have to turn off these 3 at once in case of maintenance. Is there any such replication setting in hdfs so as to specify replication targets instead of random nodes? Like, replication group definition?

1 ACCEPTED SOLUTION

Hi @Sedat Kestepe, take a look at rack awareness.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

Here's how you can configure racks using Ambari

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Users_Guide/content/ch03s11.html

HDFS will avoid placing all block replicas in the same rack to avoid data loss in case of a rack failure. You may be able to use this to achieve what you want.

View solution in original post

2 REPLIES 2

Hi @Sedat Kestepe, take a look at rack awareness.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

Here's how you can configure racks using Ambari

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Users_Guide/content/ch03s11.html

HDFS will avoid placing all block replicas in the same rack to avoid data loss in case of a rack failure. You may be able to use this to achieve what you want.

Rising Star

Thank you 🙂