Support Questions

Find answers, ask questions, and share your expertise

Service Monitor restarts repeatedly

avatar
Expert Contributor

Hello,

 

We have been facing issues with our CM Service Monitor, for the past couple of days it fails and the starts to work again, we also get the below alerts repeatedly :

 

Concerning : The last metrics aggregation run duration is 25.4 second(s). Warning threshold: 10 second(s).

 

Concerning : Average time spent paused was 27.9 second(s) (46.47%) per minute over the previous 5 minute(s). Warning threshold: 30.00%.

 

There are no configuration changes done on this node, attached is the SM log file.

5:48:14.372 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2543ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2718ms

5:48:14.373 AM	ERROR	SafeAvroResponderServlet	
Error procesing Avro request
org.mortbay.jetty.EofException
	at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
	at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
	at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
	at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:623)
	at com.cloudera.enterprise.SafeAvroHttpTransceiver.writeLength(SafeAvroHttpTransceiver.java:128)
	at com.cloudera.enterprise.SafeAvroHttpTransceiver.writeBuffers(SafeAvroHttpTransceiver.java:120)
	at com.cloudera.enterprise.SafeAvroResponderServlet.doPost(SafeAvroResponderServlet.java:57)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:595)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
	at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
	at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
	at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
	... 21 more
5:48:14.375 AM	WARN	log	Committed before 500 Error processing POST request. Check the system logs for more information.
5:48:14.375 AM	ERROR	log	
/
java.lang.IllegalStateException: Committed
	at org.mortbay.jetty.Response.resetBuffer(Response.java:1023)
	at org.mortbay.jetty.Response.sendError(Response.java:240)
	at com.cloudera.enterprise.SafeAvroResponderServlet.logAndSuppressException(SafeAvroResponderServlet.java:69)
	at com.cloudera.enterprise.SafeAvroResponderServlet.doPost(SafeAvroResponderServlet.java:59)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:595)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
5:48:32.030 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 3051ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=3431ms
5:48:36.128 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2597ms: GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2962ms
5:48:39.909 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2781ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2863ms
5:48:42.965 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2049ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2480ms
5:48:46.620 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 3155ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=3204ms
5:48:49.569 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2448ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2534ms
5:48:53.360 AM	INFO	JvmPauseMonitor	Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2790ms: GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2807ms


Any help / guidance is appreciated

1 ACCEPTED SOLUTION

avatar
Guru

Hi @wert_1311 ,

 

Thanks for posting the logs. As we can see from the JVM pause messages in the log snippet:

 

5:48:49.569 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2448ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2534ms

 

Depends on how big your cluster is, you may need to increase the memory assigned to Service Monitor (SMON) role. Please see the documentation talking about the HMON and SMON memory configuration:

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_storage.html#concept_ixl_hrk_n...

 

Also, you can think of tuning garbage collection by enabling G1GC for SMON:

From Cloudera Manager UI > Cloudera Management Services > Configuration > under SCOPE select “Service Monitor” > under CATEGORY select Advanced > in the "Java Configuration Options for Service Monitor" set the following:
-XX:+UseG1GC -XX:-UseConcMarkSweepGC -XX:-UseParNewGC

Then restart SMON.

 

For more information about tuning G1, see the Oracle documentation on tuning garbage collection:

https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId0

 

Thanks and hope this helps,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

View solution in original post

2 REPLIES 2

avatar
Guru

Hi @wert_1311 ,

 

Thanks for posting the logs. As we can see from the JVM pause messages in the log snippet:

 

5:48:49.569 AM INFO JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 2448ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2534ms

 

Depends on how big your cluster is, you may need to increase the memory assigned to Service Monitor (SMON) role. Please see the documentation talking about the HMON and SMON memory configuration:

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_storage.html#concept_ixl_hrk_n...

 

Also, you can think of tuning garbage collection by enabling G1GC for SMON:

From Cloudera Manager UI > Cloudera Management Services > Configuration > under SCOPE select “Service Monitor” > under CATEGORY select Advanced > in the "Java Configuration Options for Service Monitor" set the following:
-XX:+UseG1GC -XX:-UseConcMarkSweepGC -XX:-UseParNewGC

Then restart SMON.

 

For more information about tuning G1, see the Oracle documentation on tuning garbage collection:

https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId0

 

Thanks and hope this helps,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Expert Contributor

Hi Li,

 

Thanks for your help

 

Regards

Wert