05-08-2018 06:40 AM
We have a CDH 5.13 cluster and in the last few days our Service monitor started to have a role pipeline (SERVICE_MONITOR_ROLE_PIPELINE) problem.
The CM reports that some messages (around 6000) were dropped in the last 5 minutes.
Whenever this happens the service monitor seems unavailable and the Cloudera manager gets very slow.
There is no clear error in the log and there is no good explanation on the internet. The problem is not permanent but comes and goes.
What can cause this issue ? how can we fix it ?
05-09-2018 08:04 AM
Try to do a ping to the host were Service monitor is deployed see if you have any packet loss , if so check with Network team. We had similar issue it was due to some network nic card problem.