Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Explorer

we have two oozie instances running out of which one instance is going bad once in a day with below message.

 

OOZIE_SERVER_WEB_METRIC_COLLECTION

Role health test bad

Critical

The health test result for OOZIE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

 

 

2017-10-31 13:12:08,816 INFO com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher: Could not access Oozie Server oozie-OOZIE_SERVER-b28b7b48e807ce7c78f0ea0a52c0f67aMetricsInstrumentationService. Will attempt to access Instrumentation Service end-point.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3335)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.loadMore(UTF8StreamJsonParser.java:174)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2489)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:626)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:192)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:197)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:197)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2796)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:1627)
at com.cloudera.cmon.JsonMetricsExtractor.extractMetrics(JsonMetricsExtractor.java:227)
at com.cloudera.cmon.firehose.polling.oozie.OozieMetricsServiceFetcher.fetch(OozieMetricsServiceFetcher.java:259)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.tryFetchFromBothEndPoints(OozieServerStateFetcher.java:311)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.updateOozieMetrics(OozieServerStateFetcher.java:247)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.doWork(OozieServerStateFetcher.java:198)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.doWork(OozieServerStateFetcher.java:54)
at com.cloudera.cmon.firehose.polling.CdhTask$InstrumentedWork.doWork(CdhTask.java:230)
at com.cloudera.cmf.cdhclient.CdhExecutor$1.call(CdhExecutor.java:125)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-10-31 13:12:08,818 WARN com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher: Could not retrieve oozie metrics for oozie-OOZIE_SERVER-b28b7b48e807ce7c78f0ea0a52c0f67a
java.io.IOException: Server returned HTTP response code: 503 for URL: http://localhost:11000/oozie/v2/admin/instrumentation
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at com.cloudera.enterprise.UrlUtil.readUrlWithTimeouts(UrlUtil.java:69)
at com.cloudera.cmon.firehose.polling.oozie.OozieInstrumentationServiceFetcher.getInputStream(OozieInstrumentation

 

 

Please let me know how to fix this?

6 REPLIES 6

Re: Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Guru
Have you checked what error do you see in the Ooize server log?

This looks like CM was not able to access Oozie for some reason, the Oozie server log might give you some clue.

Re: Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Explorer

Hi Eric,

 

Thanks for your prompt response. I couldn't find anything in oozie logs either Errors/warns. could you please suggest next action plan? to identify the root cause?

Re: Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Guru
How much heap does Oozie has? Have you noticed GC hangs in the Oozie server. That might hang the Oozie process and potentially causing the timeout on client connection from CM.

Worth checking this to rule out.

Re: Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Explorer
we have given 4g heap for oozie. during the alert heap reaches to 1.3 GB oN both oozie server instances.

As i said earlier, we have two oozie instances out of which only on one instance we are getting WEB_SERVER_STATUS_BAD.

Re: Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Explorer

<property>
<name>oozie.poller.timeout.millis</name>
<value>20000</value>
</property>

should i add above configuration property in cmon.conf ? Cloudera mentioned issue was fixed in CDH 5.4.5 and we are using CDH 5.8.3. Please suggest

Re: Oozie server is getting in bad health- OOZIE_SERVER_WEB_METRIC_COLLECTION

Explorer
FYI, we are getting alerts for both oozie servers. Please let me know is there any thing needs to be checked.