We have a highly scalable kafka cluster with 11 nodes and another highly scalable Nifi cluster with 11 nodes that we are currently testing. These are created from our custom blueprint. They are deployed in Azure.
We find that everytime we stop and restart the cluster a couple of the nodes go unhealthy and we are unable to recover.
Do let us know how to resolve the issue.