About m4x1m1li4n

m4x1m1li4n · ‎11-22-2019

Hello, you are right indeed! I put the internal hostname in each node and the error disappears 127.0.0.1 localhost.localdomain localhost 172.31.16.164 master.sysdatadigital.it master 172.31.19.139 slave1.sysdatadigital.it slave1 172.31.25.187 slave2.sysdatadigital.it slave2 172.31.28.223 slave3.sysdatadigital.it slave3 172.31.16.104 slave4.sysdatadigital.it slave4 Many thanks, M

m4x1m1li4n · ‎11-22-2019

Good morning guys, I have a very strong problem that I am struggling to figure out. I installed Cloudera 6.1.1. on a set of AWS (5 hosts in total). My /etc/hosts is the following: 127.0.0.1 localhost.localdomain localhost 13.48.140.49 master.sysdatadigital.it master 13.48.181.38 slave1.sysdatadigital.it slave1 13.48.185.39 slave2.sysdatadigital.it slave2 13.53.62.160 slave3.sysdatadigital.it slave3 13.48.18.0 slave4.sysdatadigital.it slave4 I am receiving the following error from the Master by all 4 hosts in my cluster: 2019-11-22 08:22:41,355 WARN New I/O boss #15:com.cloudera.server.cmf.HeartbeatRequester: Error requesting heartbeat of host id 072beea9-4ba3-4018-8b7b-fa11fd9eac25 java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000 at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133) at com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145) at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409) at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400) at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused: ec2-13-48-18-0.eu-north-1.compute.amazonaws.com/172.31.16.104:9000 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ... 8 more I investigated the problem, and this is what I could provide you: The firewall of all hosts are disabled SE Linux disabled Iptables disabled IPV6 disabled Firewall from AWS disabled After have seen that everything is set up correctly, I ran this command on slave1: telnet localhost 9000 Result: [ec2-user@slave1 ~]$ telnet localhost 9000 Trying 127.0.0.1... telnet: connect to address 127.0.0.1: Connection refused Trying ::1... I think this is a problem. I googled this problem, and I followed this doc, and banally I tried this: hdfs namenode -format Also, I tried to do lsof -i :9000 and I received nothing; Therefore I run [ec2-user@slave1 ~]$ nc -vz localhost 9000 Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> ) Ncat: Connection to 127.0.0.1 failed: Connection refused. Ncat: Trying next address... Socket troubles: Address family not supported by protocol Ncat: Address family not supported by protocol. [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ [ec2-user@slave1 ~]$ nc -vz 127.0.0.1 9000 Ncat: Version 7.50 ( <a href="https://nmap.org/ncat" target="_blank">https://nmap.org/ncat</a> ) Ncat: Connection refused. Did anyone have the same problem? Another thing I have done is to check which service should run at port 9000: therefore I check /etc/cloudera-scm-agent/config.ini: [General] # Hostname of the CM server. server_host=ec2-13-48-140-49.eu-north-1.compute.amazonaws.com # server_host=master.sysdatadigital.it # Port that the CM server is listening on. server_port=7182 ## It should not normally be necessary to modify these. # Port that the CM agent should listen on. # listening_port=9000 # IP Address that the CM agent should listen on. # listening_ip= # Hostname that the CM agent reports as its hostname. If unset, will be # obtained in code through something like this: # # python -c 'import socket; \ # print socket.getfqdn(), \ # socket.gethostbyname(socket.getfqdn())' # # listening_hostname= ... And I noticed that the port 9000 is commented; instead the port 7182 is used. What I can do to solve this problem? Thanks, M

m4x1m1li4n · ‎11-06-2019

Hello, thanks again for following my case. I use the Elastic IP in AWS, therefore my IP addresses are fixed and they do not change. I created a subnetwork in AWS where each node can communicate to the other and I disabled the firewall (in my VPC). Question: Was your cluster deployed using a cloud formation template? Answer: No. I followed the following Cloudera Installation guide. Question: Apart from that strange port can your hostname master.sysdatadigital.it resolve to the AWS IP? Answer: Yes. I can prove it because when I turn on the cluster (AWS), from my laptop (local browser) I can just type http://master.sysdatadigital.it:7180/ to be redirected to the Cloudera Manager home page. Thanks again, M

m4x1m1li4n · ‎11-05-2019

Hello, thanks for the reply. Yeah, it’s strange and I will change it, but still I can’t do anything through the fact I can’t connect to hosts... Has anyone have an idea of what I could have done to the network configuration? I also changed the file config.ini under Cloudera folder and I replaced the host name of the master in each node with master.sysdatadigital.it rather than the long aws IP address. thanks, M

m4x1m1li4n · ‎11-05-2019

Also, I checked some log files around my cluster when I try to "do" operations: Example: I try, let us say, to restart a service from a single node. Error! Check the file /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-slave4.sysdatadigital.it.log.out When I open that file from my node the errors are the following: 2019-11-05 16:05:48,449 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 2 at election address slave3.sysdatadigital.it/13.53.62.160:4181 java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:554) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:530) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:396) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:368) at java.lang.Thread.run(Thread.java:748) 2019-11-05 16:05:48,453 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE=[zookeeper-SERVER-ca49cdfef04a282cc441985b3ebaf2c9], HOSTS=[slave4.sysdatadigital.it], ROLE_TYPE=[SERVER], CATEGORY=[LOG_MESSAGE], EVENTCODE=[EV_LOG_EVENT], SERVICE=[zookeeper], SERVICE_TYPE=[ZOOKEEPER], LOG_LEVEL=[WARN], HOST_IDS=[688609d5-69de-4dc2-9b8c-b4360de93ec6], SEVERITY=[IMPORTANT]}, content=Non-optimial configuration, consider an odd number of servers., timestamp=1572969948380} I do not know where I should look at. Thanks, M

m4x1m1li4n · ‎11-05-2019

Hello everyone, I can't complete the first installation of my Cluster on AWS. When I try to install my instances on my nodes, the error given by Cloudera Manager is the following: HTTP ERROR 502 Problem accessing /cmf/process/287/logs. Reason: Connection refused (Connection refused) Could not connect to host. I know the error is very simple, but I am struggling to find out the solution. My first thought was about the /etc/hosts file, that it is the following for each host in my cluster: 127.0.0.1 localhost.localdomain localhost 13.48.140.49 master.sysdatadigital.it master 13.48.181.38 slave1.sysdatadigital.it slave1 13.48.185.39 slave2.sysdatadigital.it slave2 13.53.62.160 slave3.sysdatadigital.it slave3 13.48.18.0 slave4.sysdatadigital.it slave4 Moreover, I checked with the following command the hostname in each node: python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())" And the result, for each host is the following: Node Master: master.sysdatadigital.it 13.48.140.49 Node Slave1: slave1.sysdatadigital.it 13.48.181.38 etc. All cluster has the following template of /etc/sysconfig/network: NETWORKING=yes NOZEROCONF=yes HOSTNAME=master.sysdatadigital.it The firewall is disabled: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) The iptables is disabled for each node: ● iptables.service - IPv4 firewall with iptables Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled) Active: inactive (dead) Each host can communicate through ssh to each other since my private key is correctly uploaded in ./ssh/id_rsa, in fact when I try to do "ssh slave3" from any node in the cluster, I can connect to the other node. Now, in my understanding, each command is not working for this reason, and I cannot start, restart, fix issues in the cluster. Does anyone have an idea or clue on what I might have missed out? Thanks a lot for your help. M

m4x1m1li4n · ‎11-05-2019

Hi, I have got a similar issue. My file host is the following for all hosts in my cluster: 127.0.0.1 localhost.localdomain localhost 13.48.140.49 master.sysdatadigital.it master 13.48.181.38 slave1.sysdatadigital.it slave1 13.48.185.39 slave2.sysdatadigital.it slave2 13.53.62.160 slave3.sysdatadigital.it slave3 13.48.18.0 slave4.sysdatadigital.it slave4 What did you change exactly? Many thanks, M

m4x1m1li4n · ‎10-31-2019

Hi Li, first, thank you so much for your answer! Very appreciated! Good spot! I registered a subdomain in my Amazon VPC and the etc/host looks like this: Therefore, my hostname for the master is the following: However, when I re-run the following command: sudo JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera /opt/cloudera/cm-agent/bin/certmanager --location /opt/cloudera/CMCA setup --configure-services I receive the following: I guess I should remove the key generated and re-do the command. Do you know how can I do that? Many thanks, M

m4x1m1li4n · ‎10-30-2019

Hello everyone, I have a question about enabling the TLS communication between the hosts in my cluster. The installation procedure recommends to enable the TLS over the cluster, but when I try to run the following command: sudo JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera /opt/cloudera/cm-agent/bin/certmanager setup --configure-services I receive the following errors: The first warning is: could not generate CSR When I check the log I got two errors: req failed for /var/lib/cloudera-scm-server/certmanager/CMCA/private/ca_key.pem. Exit code: 1 Output: problems making Certificate Request 139727014807440:error:0D07A097:asn1 encoding routines:ASN1_mbstring_ncopy:string too long:a_mbstr.c:158:maxsize=64 Does anyone have the same problem? I am struggling to figure out this issue and after having googled it I have not found much... If I skip this step, unfortunately, I will not be able to pass the Inspect Network Performance. I already tried to skip this step, but when I was ending the installation, the file "cert.py" is used to test the connection between the nodes, and it wouldn't work (I already tried) giving me the error: unable to reach the hosts". Any case, I am able to connect through ssh command from the master to the other nodes, here an example: Thanks, M

Online	Offline
Last Visited	‎12-08-2019 04:21 AM

Member Since	‎07-11-2016 07:29 AM
Last Visited	‎12-08-2019 04:21 AM
Posts	12

Cloudera Community

Re: Error requesting heartbeat of host java.net.Co...

Error requesting heartbeat of host java.net.Connec...

Re: Unable to complete the first installation of C...

Re: Unable to complete the first installation of C...

Re: Unable to complete the first installation of C...

Unable to complete the first installation of Cloud...

Re: SCM fails to start: Error getting predicates

Re: Could not generate CSR

Could not generate CSR