Support Questions

Find answers, ask questions, and share your expertise

Service Monitor Timeout

New Contributor

Hi !

      My cluster(CDH 5.6) is made up of 11hosts,about 100 entities!

      My cloudera manager's service monitor always show : Service monitor query  Timeout!

      My service monitor memory config:

      Java heap = 1GB

      firehose_non_java_memory_bytes=4GB

      But gc time and pause time of my service monitor  is very high!

      Why!

 

      My service monitor's log always show:

      Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused       approximately 3553ms: GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=3556ms

 

      The gc info:

Firehose_SERVICE_MONITORING (com.cloudera.cmon.firehose.Firehose)
Started Wed Sep 13 17:31:20 CST 2017

TimeSeriesEntityCache (com.cloudera.cmon.tstore.db.TimeSeriesEntityCache)

Currently not running.
62 runs so far (0 slow), of them 0 reported exceptions.
Last duration: 11253 ms. Total duration: 671389 ms.
Last start: September 13, 2017 6:33:31 PM +08:00. Last end: September 13, 2017 6:33:42 PM +08:00.

com.cloudera.cmon.tstore.leveldb.LDBResourceManager (com.cloudera.cmon.tstore.leveldb.LDBResourceManager)


Max File Descriptors: 2048

File Descriptors Available: 1228

Last partition used: 2017-09-13T09:32:30.345Z


com.cloudera.cmon.tstore.AggregatingTimeSeriesStore (com.cloudera.cmon.tstore.AggregatingTimeSeriesStore)

Currently not running.
62 runs so far (0 slow), of them 0 reported exceptions.
Last duration: 29210 ms. Total duration: 1848179 ms.
Last start: September 13, 2017 6:33:31 PM +08:00. Last end: September 13, 2017 6:34:00 PM +08:00.

com.cloudera.cmon.firehose.PeriodicCounterWriter (com.cloudera.cmon.firehose.PeriodicCounterWriter)

Currently not running.
62 runs so far (0 slow), of them 0 reported exceptions.
Last duration: 2 ms. Total duration: 222 ms.
Last start: September 13, 2017 6:33:31 PM +08:00. Last end: September 13, 2017 6:33:31 PM +08:00.
Write frequency seconds: 60
Metrics written: 3416
Counter map size: 4

 

 

3 REPLIES 3

Champion
It is likely just that you have enough service entities to require a larger heap size. The default of 1 GB Java Heap should be increased.

New Contributor

But in the document,it shows that:

Number of Monitored Entities Number of Hosts Required Java Heap Size Recommended Non-Java Heap Size
0-2,000 0-100 1 GB 6 GB
2,000-4,000 100-200 1.5 GB 6 GB
4,000-8,000 200-400 1.5 GB 12 GB
8,000-16,000 400-800 2.5 GB 12 GB
16,000-20,000 800-1,000 3.5 GB 12 GB

 

(https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_storage.html#concept_uyn_brk_n...)

100<2000service entities

Champion
Real world experience may differ from the docs. Right now you are seeing a large number of GC pauses. The likely cause is running out of heap space. You can try tuning the GC setting as well.