<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: &amp;quot;SERVICE_MONITOR_PAUSE_DURATION has become bad &amp;quot; despite heap increase in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314697#M226141</link>
    <description>&lt;P&gt;Update: The load went down to a reasonable level (24), so cpu starvation is not happening, but Service Monitor is still losing data from time to time with 5-30min gaps. The disk it is using is striped RAID and not used by YARN, so I don't think the issue can be disk performance.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 15 Apr 2021 01:05:22 GMT</pubDate>
    <dc:creator>pbaclace</dc:creator>
    <dc:date>2021-04-15T01:05:22Z</dc:date>
    <item>
      <title>"SERVICE_MONITOR_PAUSE_DURATION has become bad " despite heap increase</title>
      <link>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314588#M226115</link>
      <description>&lt;P&gt;I am seeing frequent Cloudera Manager Service Monitor outages:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;SPAN&gt;SERVICE_MONITOR_PAUSE_DURATION has become bad: Average time spent paused was 39.5 second(s) (65.76%) per minute over the previous 5 minute(s).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;despite increasing the heap size to 7g and the 'off-heap' size to 24g. The machine often sees a high load (a NodeManager is also on the same machine), like 90 on a 24 core machine, so I suspect it might be starved of cpu when doing aggregation. The process regularly has +700 files open.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have motivation to fix this issue since it causes data loss in the time series because SM pulls data&amp;nbsp; and misses data points for +15 minutes at times.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The WARN:&amp;nbsp;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;AggregatingTimeSeriesStore: run duration exceeded desired period&lt;/P&gt;&lt;P&gt;is correlated with the above.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a documented procedure to move Service Monitor to another machine while keeping existing data?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Perhaps like:&lt;/P&gt;&lt;P&gt;0. stop SM to quiesce changes to&amp;nbsp;&lt;SPAN class="s1"&gt;/var/lib/cloudera-service-monitor/ts/&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;1. using CM, redefine SM on another host&lt;BR /&gt;2. move&amp;nbsp;&amp;nbsp;&lt;SPAN class="s1"&gt;/var/lib/cloudera-service-monitor/ts/&amp;nbsp; contents before starting SM&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="s1"&gt;3. start SM&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="s1"&gt;SM uses LevelDB, but I don't know the internals of that and whether /var/lib/&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;cloudera-service-monitor/ts/ can just be moved. I don't want to lose the 1 month of history I have.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Apr 2021 19:49:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314588#M226115</guid>
      <dc:creator>pbaclace</dc:creator>
      <dc:date>2021-04-13T19:49:32Z</dc:date>
    </item>
    <item>
      <title>Re: "SERVICE_MONITOR_PAUSE_DURATION has become bad " despite heap increase</title>
      <link>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314591#M226118</link>
      <description>&lt;P&gt;Some more info:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see WARNs like:&lt;/P&gt;&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;SPAN class="s1"&gt;JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 28577ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;but gcutil is:&lt;/P&gt;&lt;P class="p1"&gt;&lt;FONT face="courier new,courier,monospace" size="2"&gt;&lt;SPAN class="s1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;S0 &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;S1 &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;E&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;O&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;M &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;CCS&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;YGC &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;YGCT&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;FGC&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;FGCT &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;GCT&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;FONT face="courier new,courier,monospace" size="2"&gt;&lt;SPAN class="s1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;0.00&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;63.51&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;80.24 &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;7.41&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;97.94&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;94.86 &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;5073&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;347.717 &lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;6&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;1.950&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;349.668&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;which shows old gen is only 7.41% used, so it is not out of heap. That means "JVM not scheduled" must be the condition.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Apr 2021 21:30:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314591#M226118</guid>
      <dc:creator>pbaclace</dc:creator>
      <dc:date>2021-04-13T21:30:04Z</dc:date>
    </item>
    <item>
      <title>Re: "SERVICE_MONITOR_PAUSE_DURATION has become bad " despite heap increase</title>
      <link>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314697#M226141</link>
      <description>&lt;P&gt;Update: The load went down to a reasonable level (24), so cpu starvation is not happening, but Service Monitor is still losing data from time to time with 5-30min gaps. The disk it is using is striped RAID and not used by YARN, so I don't think the issue can be disk performance.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Apr 2021 01:05:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314697#M226141</guid>
      <dc:creator>pbaclace</dc:creator>
      <dc:date>2021-04-15T01:05:22Z</dc:date>
    </item>
    <item>
      <title>Re: "SERVICE_MONITOR_PAUSE_DURATION has become bad " despite heap increase</title>
      <link>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314798#M226196</link>
      <description>&lt;P&gt;Update: I moved SM to a host that has an typical load of 7-8 instead of 24. After a day on the new machine, there have been no alerts generated about SM being slow and no gaps in charts.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Conclusion: The problem was SM works best on a machine with low load.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Apr 2021 01:19:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/quot-SERVICE-MONITOR-PAUSE-DURATION-has-become-bad-quot/m-p/314798#M226196</guid>
      <dc:creator>pbaclace</dc:creator>
      <dc:date>2021-04-17T01:19:46Z</dc:date>
    </item>
  </channel>
</rss>

