Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Bring down host for maintenance

avatar
New Contributor

Hi Community members,

 

We have some maintenance on a host. And want to bring down a host for an hour or so.

Got the below document:

https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_host_maint.html

 

So, I want to ask, what is the proper way to proceed with it:

- Select the host --> Stop all roles on the host and then bringing down cloudera-scm-agent on the host and after maintenance bring those up

or as the document mentions:

- Select the host -->Decommission Host(s)-->Take DataNode Offline and after maintenance Recommission Host(s).-->Bring hosts online and start all roles

 

Please suggest what is the difference in these two approaches and which one is best in which scenario?

 

 

1 ACCEPTED SOLUTION

avatar
Moderator

Hello @cyborg  ,

 

Thank you for reaching out to Community!  

There are two ways to place a node in maintenance mode. 

 

1) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).

The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top.Deselect the Decommission Host(s) option and Click Begin Maintenance.

The Host Decommission Command dialog box opens and displays the progress of the command.

To Exit Maintenance : Select the host --> Select Actions for Selected > End Maintenance > Deselect the Recommission Host(s) option and Click End Maintenance. This will re-enable alerts for the host.

 

By using first option, 

It does not prevent events from being logged; it only suppresses the alerts that those events would otherwise generate. You can see a history of all the events that were recorded for entities during the period that those entities were in maintenance mode.This can be useful when you need to take actions in your cluster (make configuration changes and restart various elements) and do not want to see the alerts that will be generated due to those actions.

For more details, refer https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_maint_mode.html#cmug_topic_14...

 

2) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).

The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top > Select Decommission Host(s). If the selected host is DataNode role, you can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor. If the host is not running a DataNode role, you will only see the Decommission Host(s) option and Click Begin Maintenance.The Host Decommission Command dialog box opens and displays the progress of the command.

To Exit Maintenance : Select the host --> Select Actions for Selected> Select Recommission Host(s).  > choose to bring hosts online and start all roles or choose to bring hosts online and start roles later > Click End Maintenance.

 

By using the second option,

You can perform minor maintenance on cluster hosts such as adding memory or changing network cards or cables where the maintenance window is expected. 

 

In your case: you can suppress alerts, follow the 1st path that you described in the question (for taking down single node for few hours, when no under-replicated factor and your replication factor is more than 1)


Madhuri Adipudi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

View solution in original post

1 REPLY 1

avatar
Moderator

Hello @cyborg  ,

 

Thank you for reaching out to Community!  

There are two ways to place a node in maintenance mode. 

 

1) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).

The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top.Deselect the Decommission Host(s) option and Click Begin Maintenance.

The Host Decommission Command dialog box opens and displays the progress of the command.

To Exit Maintenance : Select the host --> Select Actions for Selected > End Maintenance > Deselect the Recommission Host(s) option and Click End Maintenance. This will re-enable alerts for the host.

 

By using first option, 

It does not prevent events from being logged; it only suppresses the alerts that those events would otherwise generate. You can see a history of all the events that were recorded for entities during the period that those entities were in maintenance mode.This can be useful when you need to take actions in your cluster (make configuration changes and restart various elements) and do not want to see the alerts that will be generated due to those actions.

For more details, refer https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_maint_mode.html#cmug_topic_14...

 

2) Select the host --> Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission).

The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top > Select Decommission Host(s). If the selected host is DataNode role, you can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor. If the host is not running a DataNode role, you will only see the Decommission Host(s) option and Click Begin Maintenance.The Host Decommission Command dialog box opens and displays the progress of the command.

To Exit Maintenance : Select the host --> Select Actions for Selected> Select Recommission Host(s).  > choose to bring hosts online and start all roles or choose to bring hosts online and start roles later > Click End Maintenance.

 

By using the second option,

You can perform minor maintenance on cluster hosts such as adding memory or changing network cards or cables where the maintenance window is expected. 

 

In your case: you can suppress alerts, follow the 1st path that you described in the question (for taking down single node for few hours, when no under-replicated factor and your replication factor is more than 1)


Madhuri Adipudi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community: