Hello experts,
I installed cloudera manager 6.1.0 on a centos 7.6.
I manually installed cloudera agents on several centos 7.6.
After starting server and agents I see the connections with netstat on port 7182.
When I try to add hosts in cloudera-manager, it detects the centos and ssh, but wants to install agent again.
How may I do to pass this step ?
Regards
Alain
Created 02-14-2019 06:45 AM
Hello Alain,
When you manually install the agent and it is configured correctly it will heartbeat in to the server. The server will add it to it's Host list. If Cloudera Manager is fully installed, you will see these hosts on the Hosts page.
When you add hosts to the cluster, or if you are on a first install in the setup wizard where it prompts for hosts, there will be a second tab available. It will allow you to select from the hosts known by the server that are not already part of a cluster.
I hope this helps.
David
David Wilder, Community Manager
Created 02-15-2019 12:03 AM
Hello David,
Thanks for your reply, it helped me to review installation. I think cloudera manager is fully installed via the packages
(cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server).
I thought the heartbeat good between manager and agent but with netstat:
tcp 0 0 172.23.104.90:7182 172.23.104.91:56390 TIME_WAIT
the time_wait let me think, the connection is not fully accepted, and I can't see the hosts in the hosts page.
I continue to investigate.
Best regards
Alain
Created 02-15-2019 06:00 AM
Hello experts,
In the cloudera-scm-agent.log i find :
[14/Feb/2019 22:22:47 +0000] 10626 MainThread agent ERROR Heartbeating to lnxsrv-cloudera6-m.mnh.fr:7182 f
ailed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1397, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
framed_message = response_reader.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.
and in File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py" it tries ssl/tls connect.
So I decided to remove the certmanager directory to regenerate the Auto-TLS
sudo rm -rf /var/lib/cloudera-scm-server/certmanager
sudo JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64 /opt/cloudera/cm-agent/bin/certmanager setup --configure-services
I restart all agents and server. And now agent.log show:
[15/Feb/2019 14:22:17 +0000] 9577 MainThread agent ERROR Heartbeating to lnxsrv-cloudera6-m.mnh.fr:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1388, in _send_heartbeat
self.cfg.max_cert_depth)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
self.conn.connect()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 80, in connect
sock.connect((self.host, self.port))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 305, in connect
ret = self.connect_ssl()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 292, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert bad certificate
Why did agent try sslv3 ? The manager is configured with SSLv2Hello, and TLSv1.2.
If someone has an idea, I get it
Regards
Alain
Created 02-15-2019 08:12 AM
Hi Alain,
I see you've used openjdk to run the certmanager setup. Could you try it with a supported version of Java as we've never tested with openjdk.
You find our supported versions in the release notes at Java Requirements.
Thanks,
David
David Wilder, Community Manager
Created 02-15-2019 05:18 PM
CM/CDH 6.1 supports the use of OpenJDK 1.8, so you are good there...
Backing up a bit, looking at your first stack trace, we find that the failure occurs *after* the TLS handshake. See here:
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1397, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)
Line 1397 comes after a connection to the server has been established, so the original issue is not TLS related according to the call stack.
Based on the last call, it appears the agent was waiting for the heartbeat response but 0 bytes were returned:
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.
Based on this, it appears that the agent received a non-avro response from the server.
Among some other things, this could be caused by:
(1)
The server not being Cloudera Manager. Check to make sure the sever listening on port 7182 is actually CM. You can use:
netstat -nap |grep 7182 on the CM host
(2)
Cloudera Manager failed the processing of the heartbeat. Check the CM logs to see if there are any messages at the time that the agent is showing the exception.
/var/log/cloudera-scm-server/cloudera-scm-server.log
Hopefully one of those gives some more clues.
Created 02-18-2019 02:45 AM
Thanks bgooley for your answer.
To simplify the diagnostic, I just start an agent on the same server as the cloudera manager.
netstat return:
tcp 0 0 127.0.0.1:7182 127.0.0.1:41332 TIME_WAIT -
tcp 0 0 127.0.0.1:7182 127.0.0.1:41346 TIME_WAIT -
and the log:
[18/Feb/2019 10:44:40 +0000] 120320 MainThread agent ERROR Heartbeating to localhost:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1388, in _send_heartbeat
self.cfg.max_cert_depth)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
self.conn.connect()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 80, in connect
sock.connect((self.host, self.port))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 305, in connect
ret = self.connect_ssl()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 292, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert bad certificate
It's the last line I don't understand. The use of sslv3.
Regards
Alain
Created 03-07-2019 02:34 AM
Hello Experts,
I finally found and resolved the problem.
I destroyed all the VM and built new ones. I kept only the database VM on which I droped the database scm and re-created it (to empty it).
In our company, we use proxy to go on Internet, direct access is forbidden. So I re-wrote the repo files in /etc/yum.repos.d to put proxy=http://<proxy_user>:<proxy_password>@<proxy_url>:8080/ after each
repo definition. I canceled in /etc/yum.conf all proxy definition.
The repos for cloudera-cm and cloudera-cdh were written with the local repo I created and filled with the good packages. I ran yum update to get rpg-key (for epel-release).
The only disadvantage with this method is : no parcels can be used (the proxy definition in the wizard crashed the download (timeout connexion)).
Thanks for your help.
Alain