- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
CM machine crashes, re-install CM on new machine and add existing Hosts(datanodes)
- Labels:
-
Cloudera Manager
Created on ‎10-21-2018 10:52 PM - edited ‎10-21-2018 10:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have a situation where the whole cluster was installed and managed by CM6/CDH6, 1 machine for CM, 4 other machines for CDH, embedded DB is not use, mysql is deployed as external DB. It runs well but then the CM machine crashed due to hardware failure. It there a way to replace the hardware and reinstall teh same version of CM and add existing hosts(datanodes) to the same cluster again?
If only there is a way to re-install the CM machine after it crashes, and be able to add hosts machines to an existing cluster that is previously installed/managed by the same version of CM, it will be sufficient for us.
I tried to add existing hosts(datanodes) but installation stopped with below message at Cluster Installation -> Install Parcels
Src file /opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel/CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel does not exist
Any suggestion? am I doing right way, is there any othe correct way to achive this?
Created ‎10-22-2018 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you lost your database and then reinstalled CM, the agents will not complete the heartbeat to the new CM since the cm_guid does not match the value in CM.
To correct this, on all hosts with agents running:
- # rm /var/lib/cloudera-scm-agent/cm_guid
- # service cloudera-scm-agent restart
I think the reason you are seeing those errors in the parcels page is because the agents are in bad health... due to the cm_guid.
the cm_guid is generated by CM and the agent stores it to make sure the agent does not communicate with a CM / database that is unexpeted. The process of removing it will allow the agent to see that it should now accept communication with the new CM server/db that you have.
Created ‎10-22-2018 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you lost your database and then reinstalled CM, the agents will not complete the heartbeat to the new CM since the cm_guid does not match the value in CM.
To correct this, on all hosts with agents running:
- # rm /var/lib/cloudera-scm-agent/cm_guid
- # service cloudera-scm-agent restart
I think the reason you are seeing those errors in the parcels page is because the agents are in bad health... due to the cm_guid.
the cm_guid is generated by CM and the agent stores it to make sure the agent does not communicate with a CM / database that is unexpeted. The process of removing it will allow the agent to see that it should now accept communication with the new CM server/db that you have.
Created ‎10-22-2018 10:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
