Rack awareness is the knowledge of network structure(topology) ie location of different data node across the Hadoop cluster. While reading/writing data in HDFS, Name node chooses the Data node which is in the same rack or if not available atleast in a nearby rack. This is done by maintaining Rack id of each data node by name node. This process of choosing nearby Datanodes based on Rack ID is called as Rack Awareness. By default, Hadoop assumes all Data node belongs to the same Rack.
Rack awareness is important due to below reasons :
• It ensures high data availability and reliability.
• It improves network bandwidth.
• It increases cluster performance.
• It helps to recover data if Rack failure occurs. If rack id information is known, a back node can be easily located in case of Rack failure.