Created on 07-09-2020 10:22 PM - edited 07-10-2020 02:30 AM
Hi,
I'm trying to install Cloudera Manager on CentOS7 (on virtualbox). However, I struct at the 'Add Cluster' step which give me an error 'Installation failed. Failed to receive heartbeat from agent.' from all host. I have tried several solution posted on this community but still unable to solve its. Here is some detail.
Detail:
OS: CentOS7
Cloudera Manager version: 6.3.1
/etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.xx1 namenode.localdomain namenode
192.168.56.xx2 datanode1.localdomain datanode1
192.168.56.xx3 datanode2.localdomain datanode2
192.168.56.xx4 datanode3.localdomain datanode3
192.168.56.106 util.localdomain util
error:
>>[09/Jul/2020 23:45:02 +0000] 10401 MainThread agent ERROR Heartbeating to util.localdomain:7182 failed.
>>Traceback (most recent call last):
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1399, in _send_heartbeat
>> response = self.requestor.request('heartbeat', heartbeat_data)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
>> return self.issue_request(call_request, message_name, request_datum)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
>> call_response = self.transceiver.transceive(call_request)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
>> result = self.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
>> framed_message = response_reader.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
>> raise ConnectionClosedException("Reader read 0 bytes.")
>>ConnectionClosedException: Reader read 0 bytes.
netstat:
iptables:
/var/log/cloudera-scm-agent/cloudera-scm-agent.loc:
Thank you.
Created 07-10-2020 03:24 AM
The problem seems to be with port 7182 on the cluster but I can ss that its OK on host util.localdomain can you validate on these hosts too
192.168.56.xx1 namenode.localdomain namenode
192.168.56.xx2 datanode1.localdomain datanode1
192.168.56.xx3 datanode2.localdomain datanode2
192.168.56.xx4 datanode3.localdomain datanode3
There could be a firewall running on one of those hosts. Please do check that and let me know
Created 07-10-2020 09:38 AM
Created 07-11-2020 04:08 AM
Firstly can you ensure all the agents are running on all the hosts? I would think it a good idea to truncate the old logs
$ sudo service cloudera-scm-agent stop
$ sudo truncate --size 0 cloudera-scm-agent.log
$ sudo service cloudera-scm-agent restart
$ sudo service cloudera-scm-agent status
Then retry and let me know
Created 07-12-2020 07:19 PM
Created 07-13-2020 02:22 AM
There are a couple of things I would like you to test and share the results
What is the version of your OpenJDK? Is it higher or lower than 1.8.0_181? If its lower then you will need to upgrade your JDK
Can you test the TLS /SSL negotiation is should return exit code 0
# openssl s_client -connect util.localdomain:7183
SELinux or iptables must be disabled.
From datanode1 can you telnet successfully?
# telnet 192.168.56.106 7182
Can you check the hostname is properly configured on util
# hostname -f
Please revert
Created 07-13-2020 06:43 PM
My jdk version is 1.8.0_181.
The result from TLS/SSL return code 19 (self signed certificate in certificate chain). So this mean something wrong with TLS/SSL installation right?
Telnet can connect successfully and hostname is correctly set.
Created 07-14-2020 12:13 PM
I don't know whether you unintentionally accept your answer but it seems you are still questions is your problem resolved if not then reject the answer and update the thread
Created 07-14-2020 08:18 PM
Created 07-16-2020 05:03 PM
@Petch I see that you are getting this error:
[09/Jul/2020 23:45:02 +0000] 10401 MainThread agent ERROR Heartbeating to util.localdomain:7182 failed.
ConnectionClosedException: Reader read 0 bytes.
Usually this error is reported if the CM is looking for the agent to communicate via TLS protocol but the config.ini has this setting 'use_tls' set to 0
Check and change it to use_tls=1
Then restart the agent using the following command :
service cloudera-scm-agent restart