The current production environment cluster uses Cloudera Manager 5.5.0 to deploy Hadoop clusters (mainly HDFS, Yarn), and all nodes are deployed using physical servers.
Using 2 machines, deploying both NameNode and ResourceManager components, as a Master node, configure the cluster HA functionality through CM.
Now that you need to replace the server because of a NameNode and ResourceManager reuse server hardware failure, all the methods I've thought of need to stop clustering.
But the Hadoop cluster serves the production environment and tries not to interrupt, so what should I do?
At the same time, because the Hadoop cluster clients are deployed manually, there is no way to use the CM GetWay, as far as possible to ensure that the client does not change or change the minimum.
The plan I think of is as follows:
The HDFS and Yarn clusters first disable the existing HA configuration, remove the cluster from the cluster and CM, and then change the new machine, join the CM and cluster, and reconfigure the HA.
Use the CM migration role function to migrate components on the fault server to other machines, remove the fault server from the cluster and CM, and replace the new machine, join the CM and cluster, and re migrate the original Role migration will be the server.
Individuals prefer scheme 1, and may I trouble you for help?