We are using CM+CDH 5.7.0 with 500+ nodes in cluster. We are observing
"XX messages dropped by the role stage in the service monitor pipeline
over 5 minutes" messages in service monitor. This usually happens when
Role stage Queue Size is around 2K.
How can I increase Role Stage Queue Size? Currently it does not go
above 2K in graphs so I assume there is some configuration parameter
to change it?
Is there any way to investigate why Role Stage Queue Size is increasing?
Restarting service manager fixes the issue until Role Stage Queue Size hits 2K...
How can I disable service monitor checking hbase regions? I unchecked the canary option in hbase service monitoring role, but it did not prevent service monitor to check for hbase regions. We have lots of regions (200K) and that might be the problem.