Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

CM single point of failure ?

CM single point of failure ?

New Contributor

Hi,

 

since I do not know which of the CM forums fits best, I'll post this message here as a duplicate of "Community => Managing the platform"

 

I'm curious about what happens to the cluster if the ClouderaManager node dies ?

a) how to (re-)start / stop services manually ?

b) how to add a newly created node, running a newly installed ClouderaManager, to the cluster ?

I assume for b) it is just starting the wizard and provide the names/IPs of the cluster nodes and the CM will scan them. Afterwards double-check the role assignments. Right ?

 

b.2) If CM is used with embedded Postgresql. Assuming there is a backup of the Postgresql DB, how to setup a new CM node and provide it the backup of the Postgresql db ?

 

It's all based on the concern of how to handle the cluster after CM is unavailable.

 

Are there other recommendations, thoughts about High-Availability of CM ?

2 REPLIES 2

Re: CM single point of failure ?

New Contributor

Thanks to Allan Wilson, he answered on the mailing list how to re-integrate a CM node into the cluster (see afterwards).

 

Still open is the question of how to send command to the Agents without having a CM ?

 

 

On Tuesday, August 6, 2013, Allan Wilson wrote:

Hi Gerd,
 
The cluster continues to run without issue, however, you do lose your GUI interface.  If you lose the SCM database you will also lose your service configurations unless you back up that database and keep a copy of it located in another location.
 
If the CM server is lost and you have a good backup of the SCM database you can reinstall CM on a different server and restore the SCM database to get the service configs back.  Each agent has the name of the CM server specified in their config.ini file which is located under the  /etc/cloudera-scm-agent directory.  This file will need to be updated to point to the CM host name where it was reinstalled, if the host name isn't reused.
 
The /etc/cloudera-scm-server/db.properties file should also be updated with the scm user's password and database type.  The db.properties file would be a good backup candidate too.
 
In addition to the scm database backup you can also export the service configs as json using the CM API.
 
I can't answer the question about how to send commands to the agents until CM is restored, but restoring CM shouldn't take very long if you have another server on hand for it.
 
Hope this helps...
 
Allan

Re: CM single point of failure ?

Cloudera Employee

The agents get commands via heartbeats to the Cloudera Manager Server.  Without the Cloudera Manager Server running, you won't have this heartbeat mechanism and you'll need to get the Cloudera Manager Server going again.  If you needed to restart a daemon during Cloudera Manager Server downtime, you could do this by logging in to that machine and restarting it manually.