Created 05-14-2020 09:52 AM
Hi Community members,
We have some maintenance on a host. And want to bring down a host for an hour or so.
Got the below document:
https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_host_maint.html
So, I want to ask, what is the proper way to proceed with it:
- Select the host --> Stop all roles on the host and then bringing down cloudera-scm-agent on the host and after maintenance bring those up
or as the document mentions:
- Select the host -->Decommission Host(s)-->Take DataNode Offline and after maintenance Recommission Host(s).-->Bring hosts online and start all roles
Please suggest what is the difference in these two approaches and which one is best in which scenario?
Created on 05-15-2020 12:20 AM - edited 05-15-2020 02:37 AM
Hello @cyborg ,
Thank you for reaching out to Community!
There are two ways to place a node in maintenance mode.
1) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).
The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top.Deselect the Decommission Host(s) option and Click Begin Maintenance.
The Host Decommission Command dialog box opens and displays the progress of the command.
To Exit Maintenance : Select the host --> Select Actions for Selected > End Maintenance > Deselect the Recommission Host(s) option and Click End Maintenance. This will re-enable alerts for the host.
By using first option,
It does not prevent events from being logged; it only suppresses the alerts that those events would otherwise generate. You can see a history of all the events that were recorded for entities during the period that those entities were in maintenance mode.This can be useful when you need to take actions in your cluster (make configuration changes and restart various elements) and do not want to see the alerts that will be generated due to those actions.
For more details, refer https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_maint_mode.html#cmug_topic_14...
2) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).
The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top > Select Decommission Host(s). If the selected host is DataNode role, you can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor. If the host is not running a DataNode role, you will only see the Decommission Host(s) option and Click Begin Maintenance.The Host Decommission Command dialog box opens and displays the progress of the command.
To Exit Maintenance : Select the host --> Select Actions for Selected> Select Recommission Host(s). > choose to bring hosts online and start all roles or choose to bring hosts online and start roles later > Click End Maintenance.
By using the second option,
You can perform minor maintenance on cluster hosts such as adding memory or changing network cards or cables where the maintenance window is expected.
In your case: you can suppress alerts, follow the 1st path that you described in the question (for taking down single node for few hours, when no under-replicated factor and your replication factor is more than 1)
Madhuri Adipudi, Technical Solutions Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created on 05-15-2020 12:20 AM - edited 05-15-2020 02:37 AM
Hello @cyborg ,
Thank you for reaching out to Community!
There are two ways to place a node in maintenance mode.
1) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).
The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top.Deselect the Decommission Host(s) option and Click Begin Maintenance.
The Host Decommission Command dialog box opens and displays the progress of the command.
To Exit Maintenance : Select the host --> Select Actions for Selected > End Maintenance > Deselect the Recommission Host(s) option and Click End Maintenance. This will re-enable alerts for the host.
By using first option,
It does not prevent events from being logged; it only suppresses the alerts that those events would otherwise generate. You can see a history of all the events that were recorded for entities during the period that those entities were in maintenance mode.This can be useful when you need to take actions in your cluster (make configuration changes and restart various elements) and do not want to see the alerts that will be generated due to those actions.
For more details, refer https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_maint_mode.html#cmug_topic_14...
2) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).
The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top > Select Decommission Host(s). If the selected host is DataNode role, you can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor. If the host is not running a DataNode role, you will only see the Decommission Host(s) option and Click Begin Maintenance.The Host Decommission Command dialog box opens and displays the progress of the command.
To Exit Maintenance : Select the host --> Select Actions for Selected> Select Recommission Host(s). > choose to bring hosts online and start all roles or choose to bring hosts online and start roles later > Click End Maintenance.
By using the second option,
You can perform minor maintenance on cluster hosts such as adding memory or changing network cards or cables where the maintenance window is expected.
In your case: you can suppress alerts, follow the 1st path that you described in the question (for taking down single node for few hours, when no under-replicated factor and your replication factor is more than 1)
Madhuri Adipudi, Technical Solutions Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: