Created on 06-30-201710:55 PM - edited on 02-05-202012:23 AM by gzigldrum
PROBLEM:
We see below message on initial runs:
2017-02-28 19:37:48,681 INFO [main] impl.TimelineClientImpl: Timeline service address: http://<timeline-server-hostname>:8188/ws/v1/timeline/
2017-02-28 19:37:48,823 INFO [main] client.AHSProxy: Connecting to Application History server at <history-server-hostname>/<history-server-ip>:10200
2017-02-28 19:37:49,016 WARN [main] ipc.Client: Failed to connect to server: <resource-manager-A>/<resource-manager-A-ip>:8032: retries get failed due to exceeded maximum allowed retries number: 0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
ROOT CAUSE:
In Yarn configs -> custom yarn-site.xml, you might have configuration for rm1 as resource-manager-A. This is the reason first attempt to connect was made to this server and then got connection refused. Then it goes to active resource manager which is resource-manager-B. This warning is not seen when resource-manager-A is active, because the property rm1 pointing to this server, first connect attempt is successful. First connection attempt is always made to the resource manager which is specified in rm1.