Member since
07-11-2016
12
Posts
0
Kudos Received
0
Solutions
11-22-2019
05:22 AM
Hello, you are right indeed! I put the internal hostname in each node and the error disappears 127.0.0.1 localhost.localdomain localhost
172.31.16.164 master.sysdatadigital.it master
172.31.19.139 slave1.sysdatadigital.it slave1
172.31.25.187 slave2.sysdatadigital.it slave2
172.31.28.223 slave3.sysdatadigital.it slave3
172.31.16.104 slave4.sysdatadigital.it slave4 Many thanks, M
... View more
11-22-2019
01:33 AM
Good morning guys,
I have a very strong problem that I am struggling to figure out.
I installed Cloudera 6.1.1. on a set of AWS (5 hosts in total).
My /etc/hosts is the following:
127.0.0.1 localhost.localdomain localhost
13.48.140.49 master.sysdatadigital.it master
13.48.181.38 slave1.sysdatadigital.it slave1
13.48.185.39 slave2.sysdatadigital.it slave2
13.53.62.160 slave3.sysdatadigital.it slave3
13.48.18.0 slave4.sysdatadigital.it slave4
I am receiving the following error from the Master by all 4 hosts in my cluster:
2019-11-22 08:22:41,355 WARN New I/O boss #15:com.cloudera.server.cmf.HeartbeatRequester: Error requesting heartbeat of host id 072beea9-4ba3-4018-8b7b-fa11fd9eac25
java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
at com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
... 8 more
I investigated the problem, and this is what I could provide you:
The firewall of all hosts are disabled
SE Linux disabled
Iptables disabled
IPV6 disabled
Firewall from AWS disabled
After have seen that everything is set up correctly, I ran this command on slave1:
telnet localhost 9000
Result:
[ec2-user@slave1 ~]$ telnet localhost 9000
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying ::1...
I think this is a problem.
I googled this problem, and I followed this doc, and banally I tried this:
hdfs namenode -format
Also, I tried to do
lsof -i :9000
and I received nothing;
Therefore I run
[ec2-user@slave1 ~]$ nc -vz localhost 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection to 127.0.0.1 failed: Connection refused.
Ncat: Trying next address...
Socket troubles: Address family not supported by protocol
Ncat: Address family not supported by protocol.
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$
[ec2-user@slave1 ~]$ nc -vz 127.0.0.1 9000
Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> )
Ncat: Connection refused.
Did anyone have the same problem?
Another thing I have done is to check which service should run at port 9000: therefore I check /etc/cloudera-scm-agent/config.ini:
[General]
# Hostname of the CM server.
server_host=ec2-13-48-140-49.eu-north-1.compute.amazonaws.com
# server_host=master.sysdatadigital.it
# Port that the CM server is listening on.
server_port=7182
## It should not normally be necessary to modify these.
# Port that the CM agent should listen on.
# listening_port=9000
# IP Address that the CM agent should listen on.
# listening_ip=
# Hostname that the CM agent reports as its hostname. If unset, will be
# obtained in code through something like this:
#
# python -c 'import socket; \
# print socket.getfqdn(), \
# socket.gethostbyname(socket.getfqdn())'
#
# listening_hostname=
...
And I noticed that the port 9000 is commented; instead the port 7182 is used.
What I can do to solve this problem?
Thanks,
M
... View more
Labels:
- Labels:
-
Cloudera Manager
11-06-2019
01:29 AM
Hello, thanks again for following my case. I use the Elastic IP in AWS, therefore my IP addresses are fixed and they do not change. I created a subnetwork in AWS where each node can communicate to the other and I disabled the firewall (in my VPC). Question: Was your cluster deployed using a cloud formation template? Answer: No. I followed the following Cloudera Installation guide. Question: Apart from that strange port can your hostname master.sysdatadigital.it resolve to the AWS IP? Answer: Yes. I can prove it because when I turn on the cluster (AWS), from my laptop (local browser) I can just type http://master.sysdatadigital.it:7180/ to be redirected to the Cloudera Manager home page. Thanks again, M
... View more
11-05-2019
09:34 PM
Hello, thanks for the reply. Yeah, it’s strange and I will change it, but still I can’t do anything through the fact I can’t connect to hosts... Has anyone have an idea of what I could have done to the network configuration? I also changed the file config.ini under Cloudera folder and I replaced the host name of the master in each node with master.sysdatadigital.it rather than the long aws IP address. thanks, M
... View more
11-05-2019
08:15 AM
Also, I checked some log files around my cluster when I try to "do" operations: Example: I try, let us say, to restart a service from a single node. Error! Check the file /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-slave4.sysdatadigital.it.log.out When I open that file from my node the errors are the following: 2019-11-05 16:05:48,449 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 2 at election address slave3.sysdatadigital.it/13.53.62.160:4181
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:554)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:530)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:396)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:368)
at java.lang.Thread.run(Thread.java:748)
2019-11-05 16:05:48,453 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE=[zookeeper-SERVER-ca49cdfef04a282cc441985b3ebaf2c9], HOSTS=[slave4.sysdatadigital.it], ROLE_TYPE=[SERVER], CATEGORY=[LOG_MESSAGE], EVENTCODE=[EV_LOG_EVENT], SERVICE=[zookeeper], SERVICE_TYPE=[ZOOKEEPER], LOG_LEVEL=[WARN], HOST_IDS=[688609d5-69de-4dc2-9b8c-b4360de93ec6], SEVERITY=[IMPORTANT]}, content=Non-optimial configuration, consider an odd number of servers., timestamp=1572969948380} I do not know where I should look at. Thanks, M
... View more
11-05-2019
08:08 AM
Hello everyone,
I can't complete the first installation of my Cluster on AWS.
When I try to install my instances on my nodes, the error given by Cloudera Manager is the following:
HTTP ERROR 502
Problem accessing /cmf/process/287/logs. Reason:
Connection refused (Connection refused)
Could not connect to host.
I know the error is very simple, but I am struggling to find out the solution.
My first thought was about the /etc/hosts file, that it is the following for each host in my cluster:
127.0.0.1 localhost.localdomain localhost
13.48.140.49 master.sysdatadigital.it master
13.48.181.38 slave1.sysdatadigital.it slave1
13.48.185.39 slave2.sysdatadigital.it slave2
13.53.62.160 slave3.sysdatadigital.it slave3
13.48.18.0 slave4.sysdatadigital.it slave4
Moreover, I checked with the following command the hostname in each node:
python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"
And the result, for each host is the following:
Node Master:
master.sysdatadigital.it
13.48.140.49
Node Slave1:
slave1.sysdatadigital.it
13.48.181.38
etc.
All cluster has the following template of /etc/sysconfig/network:
NETWORKING=yes
NOZEROCONF=yes
HOSTNAME=master.sysdatadigital.it
The firewall is disabled:
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
The iptables is disabled for each node:
● iptables.service - IPv4 firewall with iptables
Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Each host can communicate through ssh to each other since my private key is correctly uploaded in ./ssh/id_rsa, in fact when I try to do "ssh slave3" from any node in the cluster, I can connect to the other node.
Now, in my understanding, each command is not working for this reason, and I cannot start, restart, fix issues in the cluster.
Does anyone have an idea or clue on what I might have missed out?
Thanks a lot for your help.
M
... View more
Labels:
- Labels:
-
Cloudera Manager
11-05-2019
07:09 AM
Hi, I have got a similar issue. My file host is the following for all hosts in my cluster: 127.0.0.1 localhost.localdomain localhost 13.48.140.49 master.sysdatadigital.it master 13.48.181.38 slave1.sysdatadigital.it slave1 13.48.185.39 slave2.sysdatadigital.it slave2 13.53.62.160 slave3.sysdatadigital.it slave3 13.48.18.0 slave4.sysdatadigital.it slave4 What did you change exactly? Many thanks, M
... View more
10-31-2019
02:16 AM
Hi Li, first, thank you so much for your answer! Very appreciated! Good spot! I registered a subdomain in my Amazon VPC and the etc/host looks like this: Therefore, my hostname for the master is the following: However, when I re-run the following command: sudo JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera /opt/cloudera/cm-agent/bin/certmanager --location /opt/cloudera/CMCA setup --configure-services I receive the following: I guess I should remove the key generated and re-do the command. Do you know how can I do that? Many thanks, M
... View more
10-30-2019
10:28 AM
Hello everyone,
I have a question about enabling the TLS communication between the hosts in my cluster.
The installation procedure recommends to enable the TLS over the cluster, but when I try to run the following command:
sudo JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera /opt/cloudera/cm-agent/bin/certmanager setup --configure-services
I receive the following errors:
The first warning is:
could not generate CSR
When I check the log I got two errors:
req failed for /var/lib/cloudera-scm-server/certmanager/CMCA/private/ca_key.pem. Exit code: 1 Output:
problems making Certificate Request
139727014807440:error:0D07A097:asn1 encoding routines:ASN1_mbstring_ncopy:string too long:a_mbstr.c:158:maxsize=64
Does anyone have the same problem? I am struggling to figure out this issue and after having googled it I have not found much...
If I skip this step, unfortunately, I will not be able to pass the Inspect Network Performance.
I already tried to skip this step, but when I was ending the installation, the file "cert.py" is used to test the connection between the nodes, and it wouldn't work (I already tried) giving me the error: unable to reach the hosts".
Any case, I am able to connect through ssh command from the master to the other nodes, here an example:
Thanks,
M
... View more
Labels:
- Labels:
-
Cloudera Manager