Cloudera manager service starts to enter bad health state because of its roles bad state.
For example, Activity Monitor started to have bad health because of below error and then the health state became good. Is there a way to investigate what could cause such issues? Mostly Cloudera manager and its roles` state is healthy, but bad health happens from time to time. I went over metrics and don't see anything suspicious except garbage collection from time to time.
The health test result for ACTIVITY_MONITOR_SCM_DESCRIPTOR_FETCH has become bad: The Cloudera Manager descriptor was refreshed 2 minute(s), 40 second(s) ago. Critical threshold: 2 minute(s).
@rok The Threshold value look less here. To avoid getting the reported health alerts we recommend to increase the corresponding configuration values:
CM -> Cloudera Management Services -> Configuration -> Search for "Descriptor Fetch Max Tries" CM -> Cloudera Management Services -> Configuration -> Event Server -> Search for "Descriptor Fetch Tries Interval" and change this value for all the roles.
We would recommend to increase the "Descriptor Fetch Max Tries" value from the default 5 to 10 in a first step.
I after that you are facing the issue than you should look at the agent logs and start from there.
Cheers! Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.