Member since
06-09-2015
2
Posts
0
Kudos Received
0
Solutions
05-24-2021
10:16 PM
Overview:
To replace an old Namenode hardware with new hardware with the same hostname/ IP address on a live production Hadoop cluster in a large-scale environment.
Problem:
When the hardware is due for refresh due to End of Service Life (EOSL) by the vendor, we will not get service support for this product such as repairs, limited tech support, parts availability, etc.
It is time for replacing the servers with new systems. There is no straightforward method published by any Hadoop vendor for this task for in-place replacement or swap of hardware with the same hostname and IP address, so that cluster does not require any downtime or restart of Hadoop services.
Option #1 - Moving highly available NN, failover controller & JN roles using Migrate Roles wizard Migrate Roles wizard offers the capability to move roles of a highly available HDFS service from one host to another. It can be used to move Namenode, Journal Node, and Failover Controller roles to new hardware.
Option #2 - Moving a Namenode to a different host using Cloudera Manager Cloudera Manager can be used to manually move a Namenode from one host to another.
Impact – Execution of both these options requires cluster downtime.
Solution:
We have tested and Implemented it on a 1500+ Node Live cluster. We tried our own procedure to perform this task with less impact on the cluster services.
Select the standby Namenode and do a graceful decommission of the server and roles.
Replace hardware with the same hostname/IP address of the old Namenode server – One node at a time. No need to restart the HDFS service.
Install the required Hadoop parcels/ packages as needed on the new hardware.
Copy the required files like fsimage/edits, OS-related configurations, etc, to new hardware. Make sure you have the same UUID of the server.
Rename and shut down old Namenode.
Make sure that is new server is attached to KDC/AD with the new hardware. Recommission new hardware with the same hostname/IP.
Start all the roles on the server.
Monitor with Namenode WebUI for the startup progress status or the logs as needed.
Test the NN failover process and make the new hardware as active Namenode.
Repeat the same procedure for the other Namenodes.
Reach to me if you would need more details.
Disclaimer from Cloudera: This article is contributed by an external user. Steps may not be verified by Cloudera and may not be applicable for all use cases and specifically to a particular distribution. Follow with caution and own risk. If needed, raise a support case to get the confirmation.
... View more
Labels: