Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Is it possible to define a replication plan among datanodes?

avatar
Expert Contributor

Hi,

To clarify the question I will illustrate the case;

Lets name datanodes like; [dnode1, dnode2, dnode3, dnode4, dnode5, dnode6, dnode7, dnode8, dnode9]

I don't want one block to be replicated among dnode1, dnode2 and dnode3 because I have to turn off these 3 at once in case of maintenance. Is there any such replication setting in hdfs so as to specify replication targets instead of random nodes? Like, replication group definition?

1 ACCEPTED SOLUTION

avatar

Hi @Sedat Kestepe, take a look at rack awareness.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

Here's how you can configure racks using Ambari

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Users_Guide/content/ch03s11.html

HDFS will avoid placing all block replicas in the same rack to avoid data loss in case of a rack failure. You may be able to use this to achieve what you want.

View solution in original post

2 REPLIES 2

avatar

Hi @Sedat Kestepe, take a look at rack awareness.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

Here's how you can configure racks using Ambari

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Users_Guide/content/ch03s11.html

HDFS will avoid placing all block replicas in the same rack to avoid data loss in case of a rack failure. You may be able to use this to achieve what you want.

avatar
Expert Contributor

Thank you 🙂