Created 07-11-2017 12:31 AM
I have a HDP 2.5.3 kerberized cluster. An express upgrade from 2.5.0 to 2.5.3 has been already successful in the past using the Ambari wizard.
The upgrade to 2.6.1 fails when trying to restart the HDFS Namnode. It fails with the "connection refused" error message (see below). I tried to choose "Ignore and Proceed" after every service restart failure, but in the end the wizard give only 2 options: retry (the last step) or downgrade.
The same error appears when performing the downgrade back to 2.5.3. it cannot restart the namenode, I always have to choose "Ignore and proceed". However, after completing the downgrade, I can start all services in Ambari.
I noticed the same behavior in the logs when starting the cluster every morning. There is exactly the same "connection refused" error lines before the lines indicating the attempts to leave safe mode. However, the start the the HDFS service always completes successfully and the namenode works in the end (that's why I notice the error only now).
It seems that only the upgrade process cannot properly restart the namenode.
Now my questions:
Regards,
Nicola
java.net.ConnectException: Call From <hostname/IP to hostname:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556) at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy10.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:711) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy11.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2657) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1340) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1324) at org.apache.hadoop.hdfs.tools.DFSAdmin.setSafeMode(DFSAdmin.java:611) at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:1916) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2107) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) ... 20 more safemode: Call From hostname/IP to hostname:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2017-07-10 10:55:33,095 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://hostname:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 17/07/10 10:55:32 WARN ipc.Client: Failed to connect to server: hostname/IP:8020: try once and fail.
Created 07-11-2017 12:33 AM
Can you please check Namenode logs to see why it's not listening on 8020. Probably it got crashed and because of that you are not able to connect to it.
It would not be good idea to do ignore and proceed for these kind of issues even if that option is there.
Created 07-11-2017 02:24 AM
in my case here, I ignored the warning and later I found I namenode just could not start, after increase namenode java heap size to 4 gb, its all working. I think there is a calculation behind scenes, if your figure dont meet the number they come up with, you would get a connection refused error.
connection refused is a such generic error, can be caused by all kinds of reasons.
try to increase your namenode java heap size then save your change and restart namenode again.
I have attached snapshot on change namenode java heap size.
Created 07-19-2017 07:08 AM
After increasing the java heap size to 4GB, I can start the NameNode without errors from the service menu.
However, the issue persists during the upgrade process.
It seems that the issue is specific to the upgrade wizard.