Created on 07-02-2016 06:17 PM - edited 08-17-2019 11:49 AM
Rack Awareness:
Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster. The importance of this knowledge relies on this assumption that collocated data nodes inside a specific rack will have more bandwidth and less latency whereas two data nodes in separate racks will have comparatively less bandwidth and higher latency.
The main purpose of Rack awareness is:
Let us assume the cluster has 9 Data Nodes with replication factor 3.
Let us also assume that there are 3 physical racks where these machines are placed:
Rack1: DN1;DN2;DN3
Rack2: DN4;DN5;DN6
Rack3: DN7:DN8;DN9
The following diagram depicts an example block placement when HDFS and Yarn are not rack aware:
The following diagram depicts an example block placement when HDFS and Yarn are rack aware:
So evidently Rack awareness increases data availability. Also the HDFS balancer and decommissioning of data nodes are rack aware operations.
What about performance?
Series 2:
How within few minutes you can setup Rack Awareness through Ambari?
https://community.hortonworks.com/articles/43164/rack-awareness-series-2.html
Created on 06-04-2018 10:48 PM
Is it necessary to have same number of hosts in all the racks?
User | Count |
---|---|
763 | |
379 | |
316 | |
309 | |
270 |