Support Questions

Find answers, ask questions, and share your expertise

cloudreak kafka and nifi cluster nodes going unhealthy after stop and restart

avatar
Explorer

We have a highly scalable kafka cluster with 11 nodes and another highly scalable Nifi cluster with 11 nodes that we are currently testing. These are created from our custom blueprint. They are deployed in Azure.

We find that everytime we stop and restart the cluster a couple of the nodes go unhealthy and we are unable to recover.

Do let us know how to resolve the issue.

107746-unhelthy-nodes-issue.png

1 ACCEPTED SOLUTION

avatar
Explorer

Just manually restart the stopped services and then run sync. That should resolve the issue


BUG-99581The Event History in the Cloudbreak web UI displays the following message:

Manual recovery is needed for the following failed nodes: []

This message is displayed when Ambari agent doesn't send the heartbeat and Cloudbreak thinks that the host is unhealthy. However, if all services are green and healthy in Ambari web UI, then it is likely that the status displayed by Cloudbreak is incorrect.

If all services are green and healthy in Ambari web UI, then syncing the cluster should fix the problem.

View solution in original post

1 REPLY 1

avatar
Explorer

Just manually restart the stopped services and then run sync. That should resolve the issue


BUG-99581The Event History in the Cloudbreak web UI displays the following message:

Manual recovery is needed for the following failed nodes: []

This message is displayed when Ambari agent doesn't send the heartbeat and Cloudbreak thinks that the host is unhealthy. However, if all services are green and healthy in Ambari web UI, then it is likely that the status displayed by Cloudbreak is incorrect.

If all services are green and healthy in Ambari web UI, then syncing the cluster should fix the problem.