Support Questions

VidyaSargur · ‎11-22-2019

Good morning guys,

I have a very strong problem that I am struggling to figure out.

I installed Cloudera 6.1.1. on a set of AWS (5 hosts in total).

My /etc/hosts is the following:

127.0.0.1 localhost.localdomain localhost
13.48.140.49 master.sysdatadigital.it master
13.48.181.38 slave1.sysdatadigital.it slave1
13.48.185.39 slave2.sysdatadigital.it slave2
13.53.62.160 slave3.sysdatadigital.it slave3
13.48.18.0 slave4.sysdatadigital.it slave4

I am receiving the following error from the Master by all 4 hosts in my cluster:

2019-11-22 08:22:41,355 WARN New I/O boss #15:com.cloudera.server.cmf.HeartbeatRequester: Error requesting heartbeat of host id 072beea9-4ba3-4018-8b7b-fa11fd9eac25
java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
        at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
        at com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
        at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        ... 8 more

I investigated the problem, and this is what I could provide you:

The firewall of all hosts are disabled
SE Linux disabled
Iptables disabled
IPV6 disabled
Firewall from AWS disabled

After have seen that everything is set up correctly, I ran this command on slave1:

telnet localhost 9000

Result:

[ec2-user@slave1 ~]$ telnet localhost 9000
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying ::1...

I think this is a problem.

I googled this problem, and I followed this doc, and banally I tried this:

hdfs namenode -format

Also, I tried to do

lsof -i :9000

and I received nothing;

Therefore I run

[ec2-user@slave1 ~]$ nc -vz localhost 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection to 127.0.0.1 failed: Connection refused.
Ncat: Trying next address...
Socket troubles: Address family not supported by protocol
Ncat: Address family not supported by protocol.
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$ nc -vz 127.0.0.1 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection refused.

Did anyone have the same problem?

Another thing I have done is to check which service should run at port 9000: therefore I check /etc/cloudera-scm-agent/config.ini:

[General]
# Hostname of the CM server.
server_host=ec2-13-48-140-49.eu-north-1.compute.amazonaws.com
# server_host=master.sysdatadigital.it

# Port that the CM server is listening on.
server_port=7182

## It should not normally be necessary to modify these.
# Port that the CM agent should listen on.
# listening_port=9000

# IP Address that the CM agent should listen on.
# listening_ip=

# Hostname that the CM agent reports as its hostname. If unset, will be
# obtained in code through something like this:
#
#   python -c 'import socket; \
#              print socket.getfqdn(), \
#                    socket.gethostbyname(socket.getfqdn())'
#
# listening_hostname=

...

And I noticed that the port 9000 is commented; instead the port 7182 is used.

What I can do to solve this problem?

Thanks,

M

sagarshimpi · ‎11-22-2019

Hi@m4x1m1li4n

Can you confirm -

Are the below ip's defined in /etc/hosts are public or private ipaddress?

13.48.140.49
13.48.181.38
13.48.185.39
13.53.62.160
13.48.18.0

Is the hostname defined in cloudera agent config.ini seems to be public hostname - "ec2-13-48-140-49.eu-north-1.compute.amazonaws.com"

Can you try pointing both /etc/hosts and config.ini[hostname] to private IP within cluster and restart agent.

I guess it might be the issue with public DNS.

View solution in original post

sagarshimpi · ‎11-22-2019

Hi@m4x1m1li4n

Can you confirm -

Are the below ip's defined in /etc/hosts are public or private ipaddress?

13.48.140.49
13.48.181.38
13.48.185.39
13.53.62.160
13.48.18.0

Is the hostname defined in cloudera agent config.ini seems to be public hostname - "ec2-13-48-140-49.eu-north-1.compute.amazonaws.com"

Can you try pointing both /etc/hosts and config.ini[hostname] to private IP within cluster and restart agent.

I guess it might be the issue with public DNS.

m4x1m1li4n · ‎11-22-2019

Hello,

you are right indeed!

I put the internal hostname in each node and the error disappears

127.0.0.1 localhost.localdomain localhost
172.31.16.164 master.sysdatadigital.it master
172.31.19.139 slave1.sysdatadigital.it slave1
172.31.25.187 slave2.sysdatadigital.it slave2
172.31.28.223 slave3.sysdatadigital.it slave3
172.31.16.104 slave4.sysdatadigital.it slave4

Many thanks,

M

Cloudera Community

Support Questions

Error requesting heartbeat of host java.net.ConnectException: Connection refused: IP.compute.amazonaws.com/IP:9000