Suppose if I set incorrect rack details for one of the datanodes, what will happen ? Will the DN be registered with NN or won't get registered ?
DN1 : correct rack : /DC01/Hall01/row4a/cab12.13
DN2: incorrect rack : /DC01/Hall01/row4b/cab12.13
DN2 has incorrect rack.. what will happen here ? Even if I put wrong rack details and if DN get registered with the NN, what is the use of rack awareness here ?
Note: None of my DNs are having '/default-racks'
There are many things to take into consideration here:
Rack awareness is helpful for 2 purposes:
1)Rack awareness for the nodes is configured based on the physical Rack of the Nodes. Based on that we will distribute the components between the Rack's to avoid the fail-over the whole Rack. Suppose if the Active node is Rack1 and Standby on Rack2, If Rack1 goes down then Namenode on Rack2 will become active. Whereas if we install both the Name-nodes on the same rack, Then if that whole rack goes down, then your whole cluster will goes down.
2)In the case of DataNodes, The files you are are writing/Reading to the HDFS, will pick the node which is closer and if the node is free/available on the same rack then that will take your action. Same case with the execution of jobs, it will get scheduled on the same rack, if any of the nodes on that rack are available, if not schedule on the other rack. (This reduces the Network traffic as well between the nodes)