Support Questions

Find answers, ask questions, and share your expertise

yarn + alert of Connection failed to http via port 8088 when using ambari hadoop on kubernetes

we have Ambari hadoop cluster based on Ambari platform and installed with HDP version - 2.6.5 , when all machines in the cluster are with RHEL 7.9 version

 

Ambari cluster of course include the YARN service with two resource manager services

 

we are facing a problems about ( when master1 and master2 nodes running with the resources manager services )

 

Connection failed to http://master2.start.com:8088 (timed out)

 

we tested the below alert with following `wget` approach ,

 

when alert appears on Ambari then following `wget` test is hang and sometimes its take a time until `wget` finished with results

 

[root@master2 yarn]# wget http://master2.start.com:8088
--2022-10-28 08:12:49-- http://master2.start.com:8088/
Resolving master2.start.com (master2.start.com)... 172.3.45.68
Connecting to master2.start.com (master2.start.com)|172.3.45.68|:8088... connected.
HTTP request sent, awaiting response... 307 TEMPORARY_REDIRECT
Location: http://master1.start.com:8088/ [following]
--2022-10-28 08:12:50-- http://master1.start.com:8088/
Resolving master1.start.com (master1.start.com)... 172.3.45.61
Connecting to master1.start.com (master01.start.com)|172.3.45.61|:8088... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://master1.start.com:8088/cluster [following]
--2022-10-28 08:12:50-- http://master1.start.com:8088/cluster
Reusing existing connection to master1.start.com:8088.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html.35’

[ <=> ] 5,419,141 9.46MB/s in 0.5s

2022-10-28 08:12:52 (9.46 MB/s) - ‘index.html.35’ saved [5419141]

 

 

port 8088 are licensing from both nodes

 

ps -ef | grep `lsof -i :8088 | grep -i listen | awk '{print $2}'`
yarn 1977 1 16 Oct27 ? 02:37:32 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_resourcemanager

 

also we cheeked with jps

 

jps -l | grep -i resourcemanager


1977 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

 

 

we also verify the resources manager logs and we see the following


2022-10-27 08:04:30,071 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at org.apache.hadoop.yarn.api.records.ContainerId.toString(ContainerId.java:196)
at org.apache.hadoop.yarn.util.ConverterUtils.toString(ConverterUtils.java:165)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.<init>(AppInfo.java:169)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:603)
at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)


2022-10-27 08:04:32,056 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at org.apache.hadoop.yarn.api.records.ContainerId.toString(ContainerId.java:196)
at org.apache.hadoop.yarn.util.ConverterUtils.toString(ConverterUtils.java:165)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.<init>(AppInfo.java:169)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:603)
at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResource

2022-10-27 08:05:43,170 ERROR recovery.RMStateStore (RMStateStore.java:notifyStoreOperationFailedInternal(992)) - State store operation failed
org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced
at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1213)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1001)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:1042)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:639)


2022-10-27 08:05:49,584 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(659)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted


2022-10-28 08:43:31,259 ERROR metrics.SystemMetricsPublisher (SystemMetricsPublisher.java:putEntity(549)) - Error when publishing entity [YARN_APPLICATION,application_1664925617878_1896]
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream closed.
at com.sun.jersey.api.client.ClientResponse.bufferEntity(ClientResponse.java:583)
at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:157)
at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:348)
at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:536)
at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationACLsUpdatedEvent(SystemMetricsPublisher.java:392)
at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:257)
at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:564)
at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:559)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Stream closed.
at java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:470)
at java.net.SocketInputStream.available(SocketInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)


still not clearly from the above logs why resources manager are with the alerts - `Connection failed to http://master2.start.com:8088 (timed out)`

Michael-Bronson
0 REPLIES 0