Custom add on service stuck in starting state and now cannot start,stop or delete the service itself.
When I try to start I get the error "Command Start is not currently available for execution."
Whey I try to stop the service I get the error "At least one role must be started."
When I try to delete the service I get the error "The following roles roles need to be stopped before deleted."
I am copying the solution here in case you don't have access to the KB article.
Please make sure you have a Cloudera Manager database backup.
Perform the following actions:
1. Log into the Cloudera Manager database and run a query against all the services in the Roles table.
select role_id, name, configured_status from ROLES where configured_status = "STOPPING"; -- Replace STOPPED with STARTING depending on the use case.
2. Note any roles that you know for certain are stopped but are indicated as "STOPPING" or any roles you know for certain are started but are indicated as "STARTING"
3. Execute an update statement to set the role to the correct state.
update ROLES set configured_status = "STOPPED" where role_id = #; -- Replace STOPPED with STARTING depending on the use case. Replace the role_id with the value got from step 1.
4. Re-execute the select statement to check to ensure the configured_status is now indicating the correct state.
5. Start or stop the role as required.
Hope this helps.
Thanks a lot to share this information with us! This is an odd behavior, could you tell me what is the root cause of this, please? Wich CDH version are impacted? At the moment I have found this problem with CDH 5.8 only.
Many thanks in advance for the kind cooperation/availability.
The issue was mostly seen during cluster upgrade. Rarely it can oocur in below scenarios:
- backend CM database connectivity going down during command execution
- agent stopping while the command was running
- some unknown reasons
We have some internal jiras associated with the issue. Based on the discussions there, it looks like there were a couple of bugs involved state transition. Both bugs were resolved in CM5.12.0 and higher. This may explain why in your cluster (assuming you have CM 5.8.x as same as your CDH version) you are seeing this issue.
Thanks and hope this helps.