Created on 11-22-2019 01:33 AM - last edited on 11-22-2019 02:11 AM by VidyaSargur
Good morning guys,
I have a very strong problem that I am struggling to figure out.
I installed Cloudera 6.1.1. on a set of AWS (5 hosts in total).
My /etc/hosts is the following:
127.0.0.1 localhost.localdomain localhost
13.48.140.49 master.sysdatadigital.it master
13.48.181.38 slave1.sysdatadigital.it slave1
13.48.185.39 slave2.sysdatadigital.it slave2
13.53.62.160 slave3.sysdatadigital.it slave3
13.48.18.0 slave4.sysdatadigital.it slave4
I am receiving the following error from the Master by all 4 hosts in my cluster:
2019-11-22 08:22:41,355 WARN New I/O boss #15:com.cloudera.server.cmf.HeartbeatRequester: Error requesting heartbeat of host id 072beea9-4ba3-4018-8b7b-fa11fd9eac25
java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
at com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
... 8 more
I investigated the problem, and this is what I could provide you:
After have seen that everything is set up correctly, I ran this command on slave1:
telnet localhost 9000
Result:
[ec2-user@slave1 ~]$ telnet localhost 9000
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying ::1...
I think this is a problem.
I googled this problem, and I followed this doc, and banally I tried this:
hdfs namenode -format
Also, I tried to do
lsof -i :9000
and I received nothing;
Therefore I run
[ec2-user@slave1 ~]$ nc -vz localhost 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection to 127.0.0.1 failed: Connection refused.
Ncat: Trying next address...
Socket troubles: Address family not supported by protocol
Ncat: Address family not supported by protocol.
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$ nc -vz 127.0.0.1 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection refused.
Did anyone have the same problem?
Another thing I have done is to check which service should run at port 9000: therefore I check /etc/cloudera-scm-agent/config.ini:
[General]
# Hostname of the CM server.
server_host=ec2-13-48-140-49.eu-north-1.compute.amazonaws.com
# server_host=master.sysdatadigital.it
# Port that the CM server is listening on.
server_port=7182
## It should not normally be necessary to modify these.
# Port that the CM agent should listen on.
# listening_port=9000
# IP Address that the CM agent should listen on.
# listening_ip=
# Hostname that the CM agent reports as its hostname. If unset, will be
# obtained in code through something like this:
#
# python -c 'import socket; \
# print socket.getfqdn(), \
# socket.gethostbyname(socket.getfqdn())'
#
# listening_hostname=
...
And I noticed that the port 9000 is commented; instead the port 7182 is used.
What I can do to solve this problem?
Thanks,
M
Created 11-22-2019 02:06 AM
Can you confirm -
13.48.140.49
13.48.181.38
13.48.185.39
13.53.62.160
13.48.18.0
Can you try pointing both /etc/hosts and config.ini[hostname] to private IP within cluster and restart agent.
I guess it might be the issue with public DNS.
Created 11-22-2019 02:06 AM
Can you confirm -
13.48.140.49
13.48.181.38
13.48.185.39
13.53.62.160
13.48.18.0
Can you try pointing both /etc/hosts and config.ini[hostname] to private IP within cluster and restart agent.
I guess it might be the issue with public DNS.
Created 11-22-2019 05:22 AM
Hello,
you are right indeed!
I put the internal hostname in each node and the error disappears
127.0.0.1 localhost.localdomain localhost
172.31.16.164 master.sysdatadigital.it master
172.31.19.139 slave1.sysdatadigital.it slave1
172.31.25.187 slave2.sysdatadigital.it slave2
172.31.28.223 slave3.sysdatadigital.it slave3
172.31.16.104 slave4.sysdatadigital.it slave4
Many thanks,
M