Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Cluster installation failing (Heartbeat)

avatar
New Contributor

I have three CentOS7 VMs created in a Xenserver install, and all three are failing at the "Add Cluster - Installation" with the below error.

 

The VMs:

master.mycluster.com

worker.mycluster.com

utility.mycluster.com

 

Cloudera manager is running on the utility node.

 

Below is the log failure, failure to heartbeat

 

>>[04/Nov/2018 21:34:03 +0000] 10805 MainThread agent        ERROR    Heartbeating to utility.mycluster.com:7182 failed.
>>Traceback (most recent call last):
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1371, in _send_heartbeat
>> response = self.requestor.request('heartbeat', heartbeat_data)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
>> return self.issue_request(call_request, message_name, request_datum)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
>> call_response = self.transceiver.transceive(call_request)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
>> result = self.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
>> framed_message = response_reader.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
>> raise ConnectionClosedException("Reader read 0 bytes.")
>>ConnectionClosedException: Reader read 0 bytes. 

 

I have been following the Cloudera install documentation for CM/CDH 6.0.1

 

/etc/hosts pasted below for each VM:

 

worker.mycluster.com:

#127.0.0.1       localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1             localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.36 utility.mycluster.com utility 192.168.0.35 worker.mycluster.com worker 192.168.0.34 master.mycluster.com master

 

 

utility.mycluster.com:

# note that localhost isn't commented out. Postgres service would fail to start if it was
127.0.0.1       localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1             localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.36 utility.mycluster.com utility 192.168.0.35 worker.mycluster.com worker 192.168.0.34 master.mycluster.com master

 

master.mycluster.com

#127.0.0.1       localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1       localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.36 utility.mycluster.com utility 192.168.0.35 worker.mycluster.com worker 192.168.0.34 master.mycluster.com master

 

 

I just installed bind on only worker.mycluster.com to see if it would fix the heartbeat issue for just that node, but it did not. Same error. See below on proof DNS is setup correctly for worker.mycluster.com

>>> hostname -f
worker.mycluster.com

>>> host -v -t $(hostname)
Trying "worker.mycluster.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8235
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;worker.mycluster.com.            IN    A 

;; ANSWER SECTION:
;worker.mycluster.com.    604800  IN    A    192.168.0.35

;; AUTHORITY SECTION:
;mycluster.com.    604800  IN    NS    worker.mycluster.com.

>>> nslookup utility.mycluster.com #Forward
server:     127.0.0.1
address:  127.0.0.1#53

Name: utility.mycluster.com
Address: 192.168.0.36

>>> nslookup 192.168.0.36 #Reverse
server:     127.0.0.1
address:  127.0.0.1#53

36.0.168.192.in-addr.arpa name = utility.mycluster.com.

 

Who agreed with this topic