Support Questions
Find answers, ask questions, and share your expertise

RACK Awareness: How is performance impacted by uneven distribution of nodes across racks ?

RACK Awareness: How is performance impacted by uneven distribution of nodes across racks ?

New Contributor

I have 11 servers with the following rack setup:

Rack 1 => node1, node2, node3

Rack 2 => node4, node5, node6

Rack 3 => node7, node8

Rack 4 => node9, node10

Rack 5 => node11

  1. How will this uneven distribution impact performance of HDFS, HBase, Kafka, Mapreduce & Yarn ?
  2. Are there any potential issues with just one server in rack5 ?

Thank you.

3 REPLIES 3

Re: RACK Awareness: How is performance impacted by uneven distribution of nodes across racks ?

Mentor

@Kashyap Magadi
Hadoop components are rack-aware. HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.Each rack is its own layer 3 network with a /24 subnet, which could be typical where each rack has its own switch with uplinks to a central core router.

You should and this is important work with your network team to have redundancy at the network level within the datacenter

Example below,

	  	  +-----------+ 
		    |core router| 
		    +-----------+ 
		 / 	          \ 
+-----------+ 			+-----------+ 
|rack switch| 			|rack switch| 
+-----------+ 			+-----------+ 
| data node | 			| data node | 
+-----------+ 			+-----------+ 
| data node | 			| data node | 
+-----------+ 			+-----------+ 

In your setup did you thoroughly think through of the components to map to racks especially NN, DN, RM, ZK, Kafka cluster etc the routers should also be redundant.

You will need to sit down and draw a topology of your cluster and map them to racks (see attached rack.jpg)

Hope that helps


rack.jpg

Re: RACK Awareness: How is performance impacted by uneven distribution of nodes across racks ?

New Contributor
 

Thank you for replying.

Below is the physical rack layout. I am trying to determine if the below architecture will have any potential performance. If so, then I plan to reach out to the network/data center team to re-rack the servers in order to minimize performance issues.

70401-rack-layout.jpg

Re: RACK Awareness: How is performance impacted by uneven distribution of nodes across racks ?

Mentor

@Kashyap Magadi

For example the NN the Active and standby should on 2 different racks with a redundant route (Core switch) same for the RM,ZK,DN and Kafka nodes the Hadoop network setup goes much deeper than that, it's the core of a successful Hadoop HA setup. Some of the things to consider are:

- Redundancy at ISP level
- Rack level
- Core Switch
- Router 

You will need to provide your logical requirements to the data center network team and they should be able to replicate that physically.

One condition is if one component of these components NN, DN, RM, ZK, Kafka cluster fails the cluster should still function normally and that depends on the software(hadoop) and hardware (network) redundancy setup. I would advise you to meetup with the network team