Hi,Please give me a professional suggestion. Thank you very much.
I need to reinstall the operating system on 10 hosts, all with datanode, nodemanager, regionserver roles, and 10T HDFS data. The cloudera-scm-agent role is mounted on the system disk, and the HDFS data is on the special data disk. The cloudera manager version is 5.7.2, and the CDH version is 5.7.6.
The ideal operation is to reinstall two hosts at a time and redeploy coudera-scm-agent, nodemanager, regionserver, and datenode roles on the host, but reuse the HDFS data of the previous data disk, so that the normal external service of the cluster can be guaranteed and the HDFS triple replica recovery mechanism will not be triggered to ensure the cluster IO is normal.
But how do I implement my operation through cloudera manager?
The key points are as follows:
1. do I need to decommissioning the datanode, regionserver and nodemanager roles on the host?
2. how can I avoid the HDFS three replication recovery mechanism to reduce the cluster IO?
I hope I can get some more detailed suggestions on how to operate. Thank you！
hu,I have done it.
I would like to mention a few points.
1. The disk mount information of the host must be backed up before the host is offline (partition UUID mount information is very important), and the /etc/passwd and /etc/group files of the host must also be backed up.
2. Do not check the Unauthorization button when deleting all roles on the host from the cluster. Otherwise, you need to wait for replication and the action will become slow.
3. Shortly after DN is offline, HDFS will detect insufficient replicas and start replicating. Some configurations can be adjusted, but because I don't want to restart the cluster, there will be no modification, which will not have much impact on the operation of the cluster.
4. it is best to build local private CM, CDH and OS resources. This will greatly reduce the speed of node installation.
5. After the host reinstalls the operating system, it is necessary to mount the disk to its original location through the previously backed-up partition UUID to ensure that the data can be reused.
6. It is very important to manually add users and groups backed up before adding them on the host to ensure that the mapping relationship is the same as the original one, so that the privileges of these data can be correctly identified by HDFS again smoothly.
7. After everything goes smoothly, the HDFS DFS fsck / command can quickly check whether the replica is sufficient and stop the previous three replica recovery actions in time.
My English is not good, the expression may not be very accurate, there are problems that can continue to discuss with me.