Support Questions

Find answers, ask questions, and share your expertise

Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

avatar
Explorer

we have two oozie instances running out of which one instance is going bad once in a day with below message.

 

OOZIE_SERVER_WEB_METRIC_COLLECTION

Role health test bad

Critical

The health test result for OOZIE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

 

 

2017-10-31 13:12:08,816 INFO com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher: Could not access Oozie Server oozie-OOZIE_SERVER-b28b7b48e807ce7c78f0ea0a52c0f67aMetricsInstrumentationService. Will attempt to access Instrumentation Service end-point.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3335)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.loadMore(UTF8StreamJsonParser.java:174)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2489)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:626)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:192)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:197)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:197)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2796)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:1627)
at com.cloudera.cmon.JsonMetricsExtractor.extractMetrics(JsonMetricsExtractor.java:227)
at com.cloudera.cmon.firehose.polling.oozie.OozieMetricsServiceFetcher.fetch(OozieMetricsServiceFetcher.java:259)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.tryFetchFromBothEndPoints(OozieServerStateFetcher.java:311)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.updateOozieMetrics(OozieServerStateFetcher.java:247)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.doWork(OozieServerStateFetcher.java:198)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.doWork(OozieServerStateFetcher.java:54)
at com.cloudera.cmon.firehose.polling.CdhTask$InstrumentedWork.doWork(CdhTask.java:230)
at com.cloudera.cmf.cdhclient.CdhExecutor$1.call(CdhExecutor.java:125)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-10-31 13:12:08,818 WARN com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher: Could not retrieve oozie metrics for oozie-OOZIE_SERVER-b28b7b48e807ce7c78f0ea0a52c0f67a
java.io.IOException: Server returned HTTP response code: 503 for URL: http://localhost:11000/oozie/v2/admin/instrumentation
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at com.cloudera.enterprise.UrlUtil.readUrlWithTimeouts(UrlUtil.java:69)
at com.cloudera.cmon.firehose.polling.oozie.OozieInstrumentationServiceFetcher.getInputStream(OozieInstrumentation

 

 

Please let me know how to fix this?

6 REPLIES 6

avatar
Super Guru
Have you checked what error do you see in the Ooize server log?

This looks like CM was not able to access Oozie for some reason, the Oozie server log might give you some clue.

avatar
Explorer

Hi Eric,

 

Thanks for your prompt response. I couldn't find anything in oozie logs either Errors/warns. could you please suggest next action plan? to identify the root cause?

avatar
Super Guru
How much heap does Oozie has? Have you noticed GC hangs in the Oozie server. That might hang the Oozie process and potentially causing the timeout on client connection from CM.

Worth checking this to rule out.

avatar
Explorer
we have given 4g heap for oozie. during the alert heap reaches to 1.3 GB oN both oozie server instances.

As i said earlier, we have two oozie instances out of which only on one instance we are getting WEB_SERVER_STATUS_BAD.

avatar
Explorer

<property>
<name>oozie.poller.timeout.millis</name>
<value>20000</value>
</property>

should i add above configuration property in cmon.conf ? Cloudera mentioned issue was fixed in CDH 5.4.5 and we are using CDH 5.8.3. Please suggest

avatar
Explorer
FYI, we are getting alerts for both oozie servers. Please let me know is there any thing needs to be checked.