Support Questions

sharathkumar13 · ‎04-26-2021

Hi,

I've been using cloudera manager for the last one year with running of 20+ nodes. Recently i started to see heap memory size issue in Service monitor roles. I've increased from 3 to 4, then 4 to 5, and then 5 to 6 GB.But still i sometime get the service monitor crashed and restarted. During the time, the entire dashboard seems bad. What i need to do here to fix the issue?.

Logs are

2021-04-26 16:10:34,938 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 20583ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=182ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20877ms
2021-04-26 16:11:34,862 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 19870ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=131ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20228ms
2021-04-26 16:12:35,132 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 20427ms: GC pool 'G1 Young Generation' had collection(s): count=3 time=149ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20733ms
2021-04-26 16:13:36,415 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 19008ms: GC pool 'G1 Young Generation' had collection(s): count=1 time=104ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=19381ms

Could you please help me on this?.

Daming Xue · ‎04-26-2021

Hello

Depends on the cluster size and monitored entities, certain resources are recommended

Please refer to the below link and check whether your resource allocation is in-line with the recommendations

https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-service-monitor-requir...

View solution in original post

Daming Xue · ‎04-26-2021

Hello

Depends on the cluster size and monitored entities, certain resources are recommended

Please refer to the below link and check whether your resource allocation is in-line with the recommendations

https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-service-monitor-requir...

sharathkumar13 · ‎05-02-2021

Thank you @Daming Xue . This is perfect lead.

Upon checking, i can see we are monitoring around 3819926 entities in the cloudera manager from the three different kafka cluster. Can you help in avoiding creating so many entities. So that ican help us to free from the alrerts keeps triggering and crashing the service monitor roles.

Right now, the heap memory size is 10Gb and the non-heap memory size is 12 Gb. But still we do get the heap memory issue. What would be an ideal solution for us to fix the issues here. We have increased the Heap memory size form 3 GB to 10 Gb till now.

Even we added this parameter in the service monitor and restarted the service, but this didn't help.

-XX:+UseG1GC -XX:-UseConcMarkSweepGC -XX:-UseParNewGC

And the error from the Service monitor log is

2021-05-02 08:24:22,106 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 33651ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=195ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=33878ms

Daming Xue · ‎05-02-2021

Hello

Please look out for Cloudera Manager 7.4.1 update, which comes with the fix to the issue you are facing

The issue is because producer client ids that are not configured correctly and are generating huge metrics from anonymous producers

The solution proposed in CM 7.4.1 is to make the Kafka producer metric whitelist configurable in the CM UI

sharathkumar13 · ‎05-02-2021

Thanks @Daming Xue . Surely ill look into upgrading the CM version.

But for the time being, do we have any other solution as this is in production. We don't want to take risks by directly upgrading the production manager service.

There are so many entities that are being monitored ex:- impala, yarn, etc, but we use only Kafka, and mirror maker. Is there any ways to disable it or does the service are necessary for running kafka clusters?.

Daming Xue · ‎05-02-2021

Hello

if the cluster is critical to your business, you should consider to get the subscription from Cloudera, and for your facing issues, Cloudera can create a patch by backporting the fix to an earlier CM version

Cloudera Community

Support Questions

Service monitor keeps crashing

Accumulo keeps crashing with error

Monitoring Apache Knox

Service Monitor doesn't start

NIFI : Monitoring processor and nifi Service

Connection refuse for Service Monitor,Host Monitor

Service Monitor Timeout

Fully Private Agents with Cloudera's AI Inference ...

Monitoring Kafka with Burrow - Part 1

NiFi REST API - FlowFile Count Monitoring

Service Monitor restarts repeatedly