Created 05-23-2019 06:23 PM
Hello,
We have been facing issues with our CM Service Monitor, for the past couple of days it fails and the starts to work again, we also get the below alerts repeatedly :
Concerning : The last metrics aggregation run duration is 25.4 second(s). Warning threshold: 10 second(s).
Concerning : Average time spent paused was 27.9 second(s) (46.47%) per minute over the previous 5 minute(s). Warning threshold: 30.00%.
There are no configuration changes done on this node, attached is the SM log file.
5:48:14.372 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2543ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2718ms 5:48:14.373 AM ERROR SafeAvroResponderServlet Error procesing Avro request org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:623) at com.cloudera.enterprise.SafeAvroHttpTransceiver.writeLength(SafeAvroHttpTransceiver.java:128) at com.cloudera.enterprise.SafeAvroHttpTransceiver.writeBuffers(SafeAvroHttpTransceiver.java:120) at com.cloudera.enterprise.SafeAvroResponderServlet.doPost(SafeAvroResponderServlet.java:57) at javax.servlet.http.HttpServlet.service(HttpServlet.java:595) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714) ... 21 more 5:48:14.375 AM WARN log Committed before 500 Error processing POST request. Check the system logs for more information. 5:48:14.375 AM ERROR log / java.lang.IllegalStateException: Committed at org.mortbay.jetty.Response.resetBuffer(Response.java:1023) at org.mortbay.jetty.Response.sendError(Response.java:240) at com.cloudera.enterprise.SafeAvroResponderServlet.logAndSuppressException(SafeAvroResponderServlet.java:69) at com.cloudera.enterprise.SafeAvroResponderServlet.doPost(SafeAvroResponderServlet.java:59) at javax.servlet.http.HttpServlet.service(HttpServlet.java:595) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 5:48:32.030 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 3051ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=3431ms 5:48:36.128 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2597ms: GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2962ms 5:48:39.909 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2781ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2863ms 5:48:42.965 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2049ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2480ms 5:48:46.620 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 3155ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=3204ms 5:48:49.569 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2448ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2534ms 5:48:53.360 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2790ms: GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2807ms
Any help / guidance is appreciated
Created 05-24-2019 09:49 AM
Hi @wert_1311 ,
Thanks for posting the logs. As we can see from the JVM pause messages in the log snippet:
5:48:49.569 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2448ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2534ms
Depends on how big your cluster is, you may need to increase the memory assigned to Service Monitor (SMON) role. Please see the documentation talking about the HMON and SMON memory configuration:
Also, you can think of tuning garbage collection by enabling G1GC for SMON:
From Cloudera Manager UI > Cloudera Management Services > Configuration > under SCOPE select “Service Monitor” > under CATEGORY select Advanced > in the "Java Configuration Options for Service Monitor" set the following:
-XX:+UseG1GC -XX:-UseConcMarkSweepGC -XX:-UseParNewGC
Then restart SMON.
For more information about tuning G1, see the Oracle documentation on tuning garbage collection:
https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId0
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created 05-24-2019 09:49 AM
Hi @wert_1311 ,
Thanks for posting the logs. As we can see from the JVM pause messages in the log snippet:
5:48:49.569 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2448ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2534ms
Depends on how big your cluster is, you may need to increase the memory assigned to Service Monitor (SMON) role. Please see the documentation talking about the HMON and SMON memory configuration:
Also, you can think of tuning garbage collection by enabling G1GC for SMON:
From Cloudera Manager UI > Cloudera Management Services > Configuration > under SCOPE select “Service Monitor” > under CATEGORY select Advanced > in the "Java Configuration Options for Service Monitor" set the following:
-XX:+UseG1GC -XX:-UseConcMarkSweepGC -XX:-UseParNewGC
Then restart SMON.
For more information about tuning G1, see the Oracle documentation on tuning garbage collection:
https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId0
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created 06-02-2019 09:06 PM
Hi Li,
Thanks for your help
Regards
Wert