Created on 11-14-2017 01:30 AM - edited 09-16-2022 05:31 AM
we have two oozie instances running out of which one instance is going bad once in a day with below message.
OOZIE_SERVER_WEB_METRIC_COLLECTION | Role health test bad | Critical | The health test result for OOZIE_SERVER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server. |
2017-10-31 13:12:08,816 INFO com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher: Could not access Oozie Server oozie-OOZIE_SERVER-b28b7b48e807ce7c78f0ea0a52c0f67aMetricsInstrumentationService. Will attempt to access Instrumentation Service end-point.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3335)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.loadMore(UTF8StreamJsonParser.java:174)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2489)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:626)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:192)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:197)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:197)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2796)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:1627)
at com.cloudera.cmon.JsonMetricsExtractor.extractMetrics(JsonMetricsExtractor.java:227)
at com.cloudera.cmon.firehose.polling.oozie.OozieMetricsServiceFetcher.fetch(OozieMetricsServiceFetcher.java:259)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.tryFetchFromBothEndPoints(OozieServerStateFetcher.java:311)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.updateOozieMetrics(OozieServerStateFetcher.java:247)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.doWork(OozieServerStateFetcher.java:198)
at com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher.doWork(OozieServerStateFetcher.java:54)
at com.cloudera.cmon.firehose.polling.CdhTask$InstrumentedWork.doWork(CdhTask.java:230)
at com.cloudera.cmf.cdhclient.CdhExecutor$1.call(CdhExecutor.java:125)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-10-31 13:12:08,818 WARN com.cloudera.cmon.firehose.polling.oozie.OozieServerStateFetcher: Could not retrieve oozie metrics for oozie-OOZIE_SERVER-b28b7b48e807ce7c78f0ea0a52c0f67a
java.io.IOException: Server returned HTTP response code: 503 for URL: http://localhost:11000/oozie/v2/admin/instrumentation
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at com.cloudera.enterprise.UrlUtil.readUrlWithTimeouts(UrlUtil.java:69)
at com.cloudera.cmon.firehose.polling.oozie.OozieInstrumentationServiceFetcher.getInputStream(OozieInstrumentation
Please let me know how to fix this?
Created 11-14-2017 02:44 AM
Created 11-15-2017 08:29 PM
Hi Eric,
Thanks for your prompt response. I couldn't find anything in oozie logs either Errors/warns. could you please suggest next action plan? to identify the root cause?
Created 11-15-2017 10:10 PM
Created 11-16-2017 01:11 AM
Created on 11-23-2017 01:08 AM - edited 11-23-2017 01:09 AM
<property>
<name>oozie.poller.timeout.millis</name>
<value>20000</value>
</property>
should i add above configuration property in cmon.conf ? Cloudera mentioned issue was fixed in CDH 5.4.5 and we are using CDH 5.8.3. Please suggest
Created 11-23-2017 12:55 AM