Support Questions

Find answers, ask questions, and share your expertise

Top of the Rack switch maintenance

avatar
Expert Contributor

In my PROD environment, Infrastructure team is going to patch Top of the Rack switches and I came to know we do not have HA enabled( from switch side). My understanding is my cluster would not function with switch going down. Am I correct ?

Also, what are the services I need to stop?

Thanks

Kumar

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Kumar Veerappan

Unfortunately YES your cluster won't function you will have to shutdown gracefully your cluster and wait for the patching to end. In an HDP HA setup the master nodes NN, RM should be on 2 distinct racks/switches

Here are some considerations

  • Machines should be on an isolated network from the rest of the data center. This means that no other applications or nodes should share network I/O with the Hadoop infrastructure. This is recommended as Hadoop is I/O intensive, and all other interference should be removed for a performant cluster.
  • Machines should have static IPs. This will enable stability in the network configuration. If the network were configured with dynamic IPs, on a machine reboot or if the DNS lease were to expire then the machine’s IP address would change, and this would cause the Hadoop services to malfunction.
  • Reverse DNS should be set up. Reverse DNS ensures that a node’s hostname can be looked up through the IP address. Certain Hadoop functionalities utilize and require reverse DNS.
  • Dedicated “Top of Rack” (TOR) switches to Hadoop
  • Use dedicated core switching blades or switches
  • Ensure application servers are “close” to Hadoop
  • Consider Ethernet bonding for increased capacity
  • All clients and cluster nodes require Network access and open firewall ports each of the services for communication between the servers.
  • If deployed to a cloud environment, then make certain all Hadoop cluster Master and Data nodes are on the same network zone (this is especially important when utilizing cloud services such as AWS and Azure).
  • If deployed to a physical environment, then make certain to place the cluster on in a VLAN.
  • The Data node and Client nodes should at the minimum have a 2 x 1 Gb Ethernet a typically recommended Network controller is 1 x 10 Gb Ethernet.
  • For the switch communicating between the racks, you will want to establish the fastest Ethernet connections possible with the most capacity.

Hope that helps

View solution in original post

1 REPLY 1

avatar
Master Mentor

@Kumar Veerappan

Unfortunately YES your cluster won't function you will have to shutdown gracefully your cluster and wait for the patching to end. In an HDP HA setup the master nodes NN, RM should be on 2 distinct racks/switches

Here are some considerations

  • Machines should be on an isolated network from the rest of the data center. This means that no other applications or nodes should share network I/O with the Hadoop infrastructure. This is recommended as Hadoop is I/O intensive, and all other interference should be removed for a performant cluster.
  • Machines should have static IPs. This will enable stability in the network configuration. If the network were configured with dynamic IPs, on a machine reboot or if the DNS lease were to expire then the machine’s IP address would change, and this would cause the Hadoop services to malfunction.
  • Reverse DNS should be set up. Reverse DNS ensures that a node’s hostname can be looked up through the IP address. Certain Hadoop functionalities utilize and require reverse DNS.
  • Dedicated “Top of Rack” (TOR) switches to Hadoop
  • Use dedicated core switching blades or switches
  • Ensure application servers are “close” to Hadoop
  • Consider Ethernet bonding for increased capacity
  • All clients and cluster nodes require Network access and open firewall ports each of the services for communication between the servers.
  • If deployed to a cloud environment, then make certain all Hadoop cluster Master and Data nodes are on the same network zone (this is especially important when utilizing cloud services such as AWS and Azure).
  • If deployed to a physical environment, then make certain to place the cluster on in a VLAN.
  • The Data node and Client nodes should at the minimum have a 2 x 1 Gb Ethernet a typically recommended Network controller is 1 x 10 Gb Ethernet.
  • For the switch communicating between the racks, you will want to establish the fastest Ethernet connections possible with the most capacity.

Hope that helps