Member since
12-02-2014
7
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1440 | 04-16-2021 06:19 PM |
04-16-2021
06:19 PM
Update: I moved SM to a host that has an typical load of 7-8 instead of 24. After a day on the new machine, there have been no alerts generated about SM being slow and no gaps in charts. Conclusion: The problem was SM works best on a machine with low load.
... View more
04-14-2021
06:05 PM
Update: The load went down to a reasonable level (24), so cpu starvation is not happening, but Service Monitor is still losing data from time to time with 5-30min gaps. The disk it is using is striped RAID and not used by YARN, so I don't think the issue can be disk performance.
... View more
04-13-2021
02:30 PM
Some more info: I see WARNs like: JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 28577ms but gcutil is: S0 S1 E O M CCS YGC YGCT FGC FGCT GCT 0.00 63.51 80.24 7.41 97.94 94.86 5073 347.717 6 1.950 349.668 which shows old gen is only 7.41% used, so it is not out of heap. That means "JVM not scheduled" must be the condition.
... View more
04-13-2021
12:49 PM
I am seeing frequent Cloudera Manager Service Monitor outages: SERVICE_MONITOR_PAUSE_DURATION has become bad: Average time spent paused was 39.5 second(s) (65.76%) per minute over the previous 5 minute(s). despite increasing the heap size to 7g and the 'off-heap' size to 24g. The machine often sees a high load (a NodeManager is also on the same machine), like 90 on a 24 core machine, so I suspect it might be starved of cpu when doing aggregation. The process regularly has +700 files open. I have motivation to fix this issue since it causes data loss in the time series because SM pulls data and misses data points for +15 minutes at times. The WARN: AggregatingTimeSeriesStore: run duration exceeded desired period is correlated with the above. Is there a documented procedure to move Service Monitor to another machine while keeping existing data? Perhaps like: 0. stop SM to quiesce changes to /var/lib/cloudera-service-monitor/ts/ 1. using CM, redefine SM on another host 2. move /var/lib/cloudera-service-monitor/ts/ contents before starting SM 3. start SM SM uses LevelDB, but I don't know the internals of that and whether /var/lib/cloudera-service-monitor/ts/ can just be moved. I don't want to lose the 1 month of history I have.
... View more
Labels:
- Labels:
-
Cloudera Manager