Need your help to test and setup below requirement.
1. I have 3 racks and datanodes distributed across 3 racks.
My requirement is 3rd copy of each data should go to third rack in any scenario.is there any way i can set this up?
I was reading many blocks but no where it confirmed that configuring rack topology will ensure that all 3 copies of data will go to 3 different racks.
- Vijay M
Can you please tell us what documentation you have reviewed? Setting the rack locations of host is normally what is used to determine block placement. If HDFS for example is aware of your topology it should ensure that at least one replica is on another rack.
Please review the Horton works community documentation. It covers rack awareness better than out documentation currently does and it is accurate. The behaviour you are describing is exactly how rack awareness works.
When HDFS is made rack aware it will place 2 replicas within the same rack and a 3rd in a remote rack. That is because local nodes within the same rack are preferable both for the HDFS framework as well as most job schedulres. With a replication factor of 3 HDFS will not place a block on every rack.