Created on 11-04-2018 07:48 PM - edited 11-04-2018 09:00 PM
I have three CentOS7 VMs created in a Xenserver install, and all three are failing at the "Add Cluster - Installation" with the below error.
The VMs:
master.mycluster.com
worker.mycluster.com
utility.mycluster.com
Cloudera manager is running on the utility node.
Below is the log failure, failure to heartbeat
>>[04/Nov/2018 21:34:03 +0000] 10805 MainThread agent ERROR Heartbeating to utility.mycluster.com:7182 failed. >>Traceback (most recent call last): >> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1371, in _send_heartbeat >> response = self.requestor.request('heartbeat', heartbeat_data) >> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request >> return self.issue_request(call_request, message_name, request_datum) >> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request >> call_response = self.transceiver.transceive(call_request) >> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive >> result = self.read_framed_message() >> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message >> framed_message = response_reader.read_framed_message() >> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message >> raise ConnectionClosedException("Reader read 0 bytes.") >>ConnectionClosedException: Reader read 0 bytes.
I have been following the Cloudera install documentation for CM/CDH 6.0.1
/etc/hosts pasted below for each VM:
worker.mycluster.com:
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.36 utility.mycluster.com utility 192.168.0.35 worker.mycluster.com worker 192.168.0.34 master.mycluster.com master
utility.mycluster.com:
# note that localhost isn't commented out. Postgres service would fail to start if it was
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.36 utility.mycluster.com utility 192.168.0.35 worker.mycluster.com worker 192.168.0.34 master.mycluster.com master
master.mycluster.com
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.36 utility.mycluster.com utility 192.168.0.35 worker.mycluster.com worker 192.168.0.34 master.mycluster.com master
I just installed bind on only worker.mycluster.com to see if it would fix the heartbeat issue for just that node, but it did not. Same error. See below on proof DNS is setup correctly for worker.mycluster.com
>>> hostname -f worker.mycluster.com >>> host -v -t $(hostname) Trying "worker.mycluster.com" ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8235 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;worker.mycluster.com. IN A ;; ANSWER SECTION: ;worker.mycluster.com. 604800 IN A 192.168.0.35 ;; AUTHORITY SECTION: ;mycluster.com. 604800 IN NS worker.mycluster.com. >>> nslookup utility.mycluster.com #Forward server: 127.0.0.1 address: 127.0.0.1#53 Name: utility.mycluster.com Address: 192.168.0.36
>>> nslookup 192.168.0.36 #Reverse
server: 127.0.0.1
address: 127.0.0.1#53
36.0.168.192.in-addr.arpa name = utility.mycluster.com.
Created 11-06-2018 04:20 AM
Hi,
It looks more like a networking problem.
Can you reach utility.mycluster.com:7182 from all the servers ?
Have you tryed disabling the firewall on the servers ?
Regards.
Created on 11-11-2018 02:11 PM - edited 11-11-2018 02:11 PM
Hello,
All servers have firewalld service stopped and disabled.
I ran the following from worker.mycluster.com, and it connects fine
telnet 192.168.0.36 7182
telnet utility.mycluster.com 7182
Also connects fine