Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

OS patching

Highlighted

OS patching

Explorer

I can find very little on OS patching methodology. We are required to apply patches monthly.

I have been trying via CM API to start and stop roles on a server before patching and rebooting.

  1. Put host into maintenance
  2. Stop all roles
  3. Apply patches
  4. Reboot
  5. Start roles
  6. Take out of maint

 

The issue occurs after the reboot. CM labels all stopped roles as FATAL after a reboot. If the roles are running prior to reboot, they restart, but if they are stopped first, they do not come back after the reboot as stopped. The main issue is testing health reports poor health due to FATAL compared to stopped/exited.

 

What are others doing to patch on a regular basis?

2 REPLIES 2
Highlighted

Re: OS patching

Champion

Did you check the status of the cloudera manager services ?

Like Activity Monitor , Host Monitor , Service Monitor etc

 

Re: OS patching

Explorer

The mgmt service roles are running on their own host (not the one being patched) and are fine. The issue is that if you restart a host with stopped roles, you are guaranteed to have services with health issues after the reboot. If the roles are running before the reboot, then the services recover. CM cannot keep track that roles are stopped. Stopping roles does not cause health issues, but rebooting with stopped roles does. For now I am forgoing the health check after the reboot and just starting all roles on a host no matter the state.

 

My method is working on a small test cluster but I am leary to start using the process on our production cluster and doing OS patching in a rolling manner without taking any downtime.

 

To veer a little from the topic, I have not found a way via the API to access the mgmt service. The mgmt service roles return with a hosts's host.roleRefs but are not accessible as they have no handle (that I can figure out via the API)

 

'cluster.get_service(rref.serviceName).stop_roles(rref.roleName)' does not work as the mgmt service is not part of 'cluster'. I can get a ref to the role but I cannot actually access the role. Maybe someone knows what I am missing and can point to a link or share the secret.

 

Service

mgmt

Roles

mgmt-HOSTMONITOR-a9de710bc2672ee1ba304933bdf0b946

mgmt-REPORTSMANAGER-a9de710bc2672ee1ba304933bdf0b946

mgmt-EVENTSERVER-a9de710bc2672ee1ba304933bdf0b946

mgmt-ACTIVITYMONITOR-a9de710bc2672ee1ba304933bdf0b946

mgmt-SERVICEMONITOR-a9de710bc2672ee1ba304933bdf0b946

Don't have an account?
Coming from Hortonworks? Activate your account here