I'm curious about what happens to the cluster if the ClouderaManager node dies ?
a) how to (re-)start / stop services manually ?
b) how to add a newly created node, running a newly installed ClouderaManager, to the cluster ?
I assume for b) it is just starting the wizard and provide the names/IPs of the cluster nodes and the CM will scan them. Afterwards double-check the role assignments. Right ?
b.2) If CM is used with embedded Postgresql. Assuming there is a backup of the Postgresql DB, how to setup a new CM node and provide it the backup of the Postgresql db ?
It's all based on the concern of how to handle the cluster after CM is unavailable.
Are there other recommendations, thoughts about High-Availability of CM ?
When the Cloudera Manager Server dies (or stops), then your cluster will continue to operate normally (you can keep using hdfs, mapreduce, hue, hive, oozie, etc), but you will be unable to use CM to start, stop, alter config, etc. Monitoring services will generally continue to capture audit and activity information if they are still running.
You should always take regular backups of your CM database. If your main CM server is truly unrecoverable, then you can just set up a new CM server on another host and configure it to talk to the same database, then configure your agents (or use DNS tricks to have the new server look like the old one) update them with the new CM location. If your CM database dies, you can restore from backup.
You can add a newly created node through the Hosts tab, clicking the Add New Hosts to Cluster button. There's a wizard that will guide you through the process. Alternatively, you can use "Path B" to install the CM packages yourself. You can then leverage the Host Template system to make your new host pick up the same roles and configs as some other host, which is very useful when adding slave nodes.
Whether CM is using embedded postgresql or some other database, you'll need to make sure you restore the database correctly and then configure CM to talk to that database, which is usually controlled in the /etc/cloudera-scm-server/db.properties file. If using the embedded db, then also be sure to restore the /etc/cloudera-scm-server/db.mgmt.properties file. An alternative to relying on database backups is to use and configure a database that itself has HA.