Created on 04-11-2019 10:52 AM - edited 08-17-2019 04:02 PM
We have a highly scalable kafka cluster with 11 nodes and another highly scalable Nifi cluster with 11 nodes that we are currently testing. These are created from our custom blueprint. They are deployed in Azure.
We find that everytime we stop and restart the cluster a couple of the nodes go unhealthy and we are unable to recover.
Do let us know how to resolve the issue.
Created 04-11-2019 01:30 PM
Just manually restart the stopped services and then run sync. That should resolve the issue
BUG-99581 | The Event History in the Cloudbreak web UI displays the following message: Manual recovery is needed for the following failed nodes: [] This message is displayed when Ambari agent doesn't send the heartbeat and Cloudbreak thinks that the host is unhealthy. However, if all services are green and healthy in Ambari web UI, then it is likely that the status displayed by Cloudbreak is incorrect. | If all services are green and healthy in Ambari web UI, then syncing the cluster should fix the problem. |
Created 04-11-2019 01:30 PM
Just manually restart the stopped services and then run sync. That should resolve the issue
BUG-99581 | The Event History in the Cloudbreak web UI displays the following message: Manual recovery is needed for the following failed nodes: [] This message is displayed when Ambari agent doesn't send the heartbeat and Cloudbreak thinks that the host is unhealthy. However, if all services are green and healthy in Ambari web UI, then it is likely that the status displayed by Cloudbreak is incorrect. | If all services are green and healthy in Ambari web UI, then syncing the cluster should fix the problem. |