Support Questions

Find answers, ask questions, and share your expertise

Error requesting heartbeat of host java.net.ConnectException: Connection refused: IP.compute.amazonaws.com/IP:9000

avatar
Explorer

Good morning guys, 

 

I have a very strong problem that I am struggling to figure out.

 

I installed Cloudera 6.1.1. on a set of AWS (5 hosts in total).

 

My /etc/hosts is the following:

 

127.0.0.1 localhost.localdomain localhost
13.48.140.49 master.sysdatadigital.it master
13.48.181.38 slave1.sysdatadigital.it slave1
13.48.185.39 slave2.sysdatadigital.it slave2
13.53.62.160 slave3.sysdatadigital.it slave3
13.48.18.0 slave4.sysdatadigital.it slave4

 

 

I am receiving the following error from the Master by all 4 hosts in my cluster:

 

 

2019-11-22 08:22:41,355 WARN New I/O boss #15:com.cloudera.server.cmf.HeartbeatRequester: Error requesting heartbeat of host id 072beea9-4ba3-4018-8b7b-fa11fd9eac25
java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
        at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
        at com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
        at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        ... 8 more

 

 

I investigated the problem, and this is what I could provide you:

  1. The firewall of all hosts are disabled
  2. SE Linux disabled
  3. Iptables disabled
  4. IPV6 disabled
  5. Firewall from AWS disabled

After have seen that everything is set up correctly, I ran this command on slave1:  

telnet localhost 9000

 

Result:

 

[ec2-user@slave1 ~]$ telnet localhost 9000
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying ::1...

 

 

I think this is a problem.

 

I googled this problem, and I followed this doc, and banally I tried this: 

 

hdfs namenode -format

 

 

Also, I tried to do

 

lsof -i :9000

 

and I received nothing;

 

Therefore I run

 

[ec2-user@slave1 ~]$ nc -vz localhost 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection to 127.0.0.1 failed: Connection refused.
Ncat: Trying next address...
Socket troubles: Address family not supported by protocol
Ncat: Address family not supported by protocol.
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$ nc -vz 127.0.0.1 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection refused.

 

 

Did anyone have the same problem?

 

Another thing I have done is to check which service should run at port 9000: therefore I check /etc/cloudera-scm-agent/config.ini:

 

[General]
# Hostname of the CM server.
server_host=ec2-13-48-140-49.eu-north-1.compute.amazonaws.com
# server_host=master.sysdatadigital.it

# Port that the CM server is listening on.
server_port=7182

## It should not normally be necessary to modify these.
# Port that the CM agent should listen on.
# listening_port=9000

# IP Address that the CM agent should listen on.
# listening_ip=

# Hostname that the CM agent reports as its hostname. If unset, will be
# obtained in code through something like this:
#
#   python -c 'import socket; \
#              print socket.getfqdn(), \
#                    socket.gethostbyname(socket.getfqdn())'
#
# listening_hostname=

...

 

 

And I noticed that the port 9000 is commented; instead the port 7182 is used.

 

What I can do to solve this problem?

 

Thanks,

M

 

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi@m4x1m1li4n  

 

Can you confirm -

 

  • Are the below ip's defined in /etc/hosts are public or private ipaddress?
13.48.140.49
13.48.181.38
13.48.185.39
13.53.62.160
13.48.18.0
  • Is the hostname defined in cloudera agent config.ini  seems to be public hostname - "ec2-13-48-140-49.eu-north-1.compute.amazonaws.com"

Can you try pointing both /etc/hosts and config.ini[hostname] to private IP within cluster and restart agent.

I guess it might be the issue with public DNS.

 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hi@m4x1m1li4n  

 

Can you confirm -

 

  • Are the below ip's defined in /etc/hosts are public or private ipaddress?
13.48.140.49
13.48.181.38
13.48.185.39
13.53.62.160
13.48.18.0
  • Is the hostname defined in cloudera agent config.ini  seems to be public hostname - "ec2-13-48-140-49.eu-north-1.compute.amazonaws.com"

Can you try pointing both /etc/hosts and config.ini[hostname] to private IP within cluster and restart agent.

I guess it might be the issue with public DNS.

 

avatar
Explorer

Hello, 

 

you are right indeed!

I put the internal hostname in each node and the error disappears

 

127.0.0.1 localhost.localdomain localhost
172.31.16.164 master.sysdatadigital.it master
172.31.19.139 slave1.sysdatadigital.it slave1
172.31.25.187 slave2.sysdatadigital.it slave2
172.31.28.223 slave3.sysdatadigital.it slave3
172.31.16.104 slave4.sysdatadigital.it slave4

 

Many thanks,