Created 12-12-2016 10:28 AM
Hi all,
I am using HDP 2.5. When I try to run a spark job or context (using a Jupyter notebook or pyspark shell), I always obtain the following error:
WARN Client: Failed to connect to server: mycluster.at/111.11.11.11:8032: retries get failed due to exceeded maximum allowed retries number: 0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy15.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy16.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:225) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:233) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:157) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) at org.apache.spark.SparkContext.<init>(SparkContext.scala:500) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:236) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:211) at java.lang.Thread.run(Thread.java:745)
Then the job is running fine, but that warning is always there. I have another cluster with HDP 2.4 and I don't see that warning.
Any ideas?
Thanks in advance,
Created 12-12-2016 10:10 PM
Your standby RM (rm1) must be the first RM in the configured list of RMs. So its tried first and that results in exceptions.
Created 12-12-2016 10:29 AM
Do you have resource manager HA cofigured? by looking at this error, it looks like your rm1 is standby and rm2 works fine.
Can you please check?
Created 12-12-2016 11:54 AM
Yes, Resource manager HA is configured, but both are working fine, just rm1 is in standby mode and rm2 is active.
Created 12-12-2016 10:10 PM
Your standby RM (rm1) must be the first RM in the configured list of RMs. So its tried first and that results in exceptions.
Created 12-13-2016 08:19 AM
Hi @bikas, ok I understood. So I should not worry about. Thanks