Created 07-30-2018 02:05 PM
MapReduce service check fails with ipc.Client connection timed out error
2018-07-30 14:39:43,127 - Execute['hadoop --config /usr/hdp/current/hadoop-client/conf jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar wordcount /user/ambari-qa/mapredsmokeinput /user/ambari-qa/mapredsmokeoutput'] {'logoutput': True, 'try_sleep': 5, 'environment': {}, 'tries': 1, 'user': 'ambari-qa', 'path': ['/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-client/bin']} 18/07/30 14:39:45 INFO impl.TimelineClientImpl: Timeline service address: http://hostname:8188/ws/v1/timeline/ 18/07/30 14:39:45 INFO client.RMProxy: Connecting to ResourceManager at hostname/ip:8050 18/07/30 14:39:45 INFO client.AHSProxy: Connecting to Application History server at hostname/ip:10200 18/07/30 14:40:49 INFO ipc.Client: Retrying connect to server: hostname/ip:8050. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) 18/07/30 14:41:53 INFO ipc.Client: Retrying connect to server: hostname/ip:8050. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) 18/07/30 14:42:57 INFO ipc.Client: Retrying connect to server: hostanme/ip:8050. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) 18/07/30 14:44:01 INFO ipc.Client: Retrying connect to server: hostname/ip:8050. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
Logs:
WARN ipc.Client (Client.java:handleConnectionFailure(886)) - Failed to connect to server: ResourceManager-Hostname/ResourceManager-ip-address:8050: retries get failed due to exceeded maximum allowed retries number: 50 java.net.ConnectException: Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy77.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:191) at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy78.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.isApplicationTerminated(AggregatedLogDeletionService.java:155) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.deleteOldLogDirsFrom(AggregatedLogDeletionService.java:101) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:85) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505)
The port 8050 is open and listening
[root@bhwx24hwxworker2 yarn]# netstat --listen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:8188 *:* LISTEN
tcp 0 0 *:8030 *:* LISTEN
tcp 0 0 *:8670 *:* LISTEN
tcp 0 0 *:8191 *:* LISTEN
tcp 0 0 *:sqlexec *:* LISTEN
tcp 0 0 *:10020 *:* LISTEN
tcp 0 0 *:eforward *:* LISTEN
tcp 0 0 *:40070 *:* LISTEN
tcp 0 0 localhost:40071 *:* LISTEN
tcp 0 0 *:8040 *:* LISTEN
tcp 0 0 *:40072 *:* LISTEN
tcp 0 0 *:7337 *:* LISTEN
tcp 0 0 *:fs-agent *:* LISTEN
tcp 0 0 *:8141 *:* LISTEN
tcp 0 0 *:45454 *:* LISTEN
tcp 0 0 *:19888 *:* LISTEN
tcp 0 0 bhwx24hwxworke:ciphire-serv *:* LISTEN
tcp 0 0 *:10033 *:* LISTEN
tcp 0 0 *:8050 *:* LISTEN
tcp 0 0 *:39987 *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp 0 0 *:7447 *:* LISTEN
tcp 0 0 *:trisoap *:* LISTEN
tcp 0 0 *:radan-http *:* LISTEN
tcp 0 0 *:irisa *:* LISTEN
tcp 0 0 *:ca-audit-da *:* LISTEN
tcp 0 0 localhost:8089 *:* LISTEN
tcp 0 0 localhost:metasys *:* LISTEN
tcp 0 0 localhost:smtp *:* LISTEN
tcp 0 0 *:13562 *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
udp 0 0 bhwx24hwxworker2.cse-int:ntp *:*
udp 0 0 localhost:ntp *:*
udp 0 0 *:ntp *:*
udp 0 0 *:bootpc *:*
udp 0 0 *:ntp *:*
Created 07-30-2018 08:43 PM
Check whether you are able to telnet to RM:8050 and also check netstat output on RM machine whether you see any connections from node on which service check is running.
Created 07-31-2018 11:00 AM
@schhabra : Thanks for the response, The service check is getting fired from the same host where RM is installed .
18/07/31 11:11:34 INFO impl.TimelineClientImpl: Timeline service address: http://RM-host:8188/ws/v1/timeline/ 18/07/31 11:11:34 INFO client.RMProxy: Connecting to ResourceManager at RM-host/RM-ip:8050 18/07/31 11:11:35 INFO client.AHSProxy: Connecting to Application History server at RM-host/RM-ip:10200 18/07/31 11:12:39 INFO ipc.Client: Retrying connect to server: RM-host/RM-ip:8050. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) 18/07/31 11:13:43 INFO ipc.Client: Retrying connect to server: RMhost/RM-ip:8050. Already tried 1 time(s); retry policy is