Created 04-26-2021 09:18 AM
Hi,
I've been using cloudera manager for the last one year with running of 20+ nodes. Recently i started to see heap memory size issue in Service monitor roles. I've increased from 3 to 4, then 4 to 5, and then 5 to 6 GB.But still i sometime get the service monitor crashed and restarted. During the time, the entire dashboard seems bad. What i need to do here to fix the issue?.
Logs are
2021-04-26 16:10:34,938 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 20583ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=182ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20877ms
2021-04-26 16:11:34,862 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 19870ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=131ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20228ms
2021-04-26 16:12:35,132 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 20427ms: GC pool 'G1 Young Generation' had collection(s): count=3 time=149ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=20733ms
2021-04-26 16:13:36,415 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 19008ms: GC pool 'G1 Young Generation' had collection(s): count=1 time=104ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=19381ms
Could you please help me on this?.
Created 04-26-2021 06:58 PM
Hello
Depends on the cluster size and monitored entities, certain resources are recommended
Please refer to the below link and check whether your resource allocation is in-line with the recommendations
Created 04-26-2021 06:58 PM
Hello
Depends on the cluster size and monitored entities, certain resources are recommended
Please refer to the below link and check whether your resource allocation is in-line with the recommendations
Created on 05-02-2021 01:24 AM - edited 05-02-2021 01:25 AM
Thank you @Daming Xue . This is perfect lead.
Upon checking, i can see we are monitoring around 3819926 entities in the cloudera manager from the three different kafka cluster. Can you help in avoiding creating so many entities. So that ican help us to free from the alrerts keeps triggering and crashing the service monitor roles.
Right now, the heap memory size is 10Gb and the non-heap memory size is 12 Gb. But still we do get the heap memory issue. What would be an ideal solution for us to fix the issues here. We have increased the Heap memory size form 3 GB to 10 Gb till now.
Even we added this parameter in the service monitor and restarted the service, but this didn't help.
-XX:+UseG1GC -XX:-UseConcMarkSweepGC -XX:-UseParNewGC
And the error from the Service monitor log is
2021-05-02 08:24:22,106 WARN com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 33651ms: GC pool 'G1 Young Generation' had collection(s): count=2 time=195ms, GC pool 'G1 Old Generation' had collection(s): count=1 time=33878ms
Created 05-02-2021 06:50 PM
Hello
Please look out for Cloudera Manager 7.4.1 update, which comes with the fix to the issue you are facing
The issue is because producer client ids that are not configured correctly and are generating huge metrics from anonymous producers
The solution proposed in CM 7.4.1 is to make the Kafka producer metric whitelist configurable in the CM UI
Created 05-02-2021 09:36 PM
Thanks @Daming Xue . Surely ill look into upgrading the CM version.
But for the time being, do we have any other solution as this is in production. We don't want to take risks by directly upgrading the production manager service.
There are so many entities that are being monitored ex:- impala, yarn, etc, but we use only Kafka, and mirror maker. Is there any ways to disable it or does the service are necessary for running kafka clusters?.
Created 05-02-2021 09:56 PM
Hello
if the cluster is critical to your business, you should consider to get the subscription from Cloudera, and for your facing issues, Cloudera can create a patch by backporting the fix to an earlier CM version