Support Questions

Find answers, ask questions, and share your expertise

scm-agent connected but not recognized by scm-server

avatar
Explorer

Hello experts,

 

I installed cloudera manager 6.1.0 on a centos 7.6.

I manually installed cloudera agents on several centos 7.6.

After starting server and agents I see the connections with netstat on port 7182.

When I try to add hosts in cloudera-manager, it detects the centos and ssh, but wants to install agent again.

 

How may I do to pass this step ?

 

Regards

Alain

7 REPLIES 7

avatar
Community Manager

Hello Alain,

 

When you manually install the agent and it is configured correctly it will heartbeat in to the server.  The server will add it to it's Host list.   If Cloudera Manager is fully installed, you will see these hosts on the Hosts page.

 

When you add hosts to the cluster, or if you are on a first install in the setup wizard where it prompts for hosts, there will be a second tab available.  It will allow you to select from the hosts known by the server that are not already part of a cluster.

 

I hope this helps.

 

David



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Explorer

Hello David,

 

Thanks for your reply, it helped me to review installation. I think cloudera manager is fully installed via the packages

(cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server).

I thought the heartbeat good between manager and agent but with netstat:

tcp 0 0 172.23.104.90:7182 172.23.104.91:56390 TIME_WAIT

the time_wait let me think, the connection is not fully accepted, and I can't see the hosts in the hosts page.

 

I continue to investigate.

 

Best regards

Alain

avatar
Explorer

Hello experts,

 

In the cloudera-scm-agent.log i find :

[14/Feb/2019 22:22:47 +0000] 10626 MainThread agent ERROR Heartbeating to lnxsrv-cloudera6-m.mnh.fr:7182 f
ailed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1397, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
framed_message = response_reader.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.

 

and in File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py"  it tries  ssl/tls connect.

 

So I decided to remove the certmanager directory to regenerate the Auto-TLS

sudo rm -rf /var/lib/cloudera-scm-server/certmanager

sudo JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64 /opt/cloudera/cm-agent/bin/certmanager setup --configure-services

 

I restart all agents and server. And now  agent.log show:

[15/Feb/2019 14:22:17 +0000] 9577 MainThread agent ERROR Heartbeating to lnxsrv-cloudera6-m.mnh.fr:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1388, in _send_heartbeat
self.cfg.max_cert_depth)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
self.conn.connect()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 80, in connect
sock.connect((self.host, self.port))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 305, in connect
ret = self.connect_ssl()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 292, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert bad certificate

 

Why did agent try sslv3 ? The manager is configured with SSLv2Hello, and TLSv1.2.

 

If someone has an idea, I get it

 

Regards

Alain

 

avatar
Community Manager

Hi Alain,

 

I see you've used openjdk to run the certmanager setup.  Could you try it with a supported version of Java as we've never tested with openjdk.

 

You find our supported versions in the release notes at Java Requirements.

 

Thanks,

David



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Master Guru

@aalexand,

 

CM/CDH 6.1 supports the use of OpenJDK 1.8, so you are good there...

 

Backing up a bit, looking at your first stack trace, we find that the failure occurs *after* the TLS handshake.  See here:

Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1397, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)

 

Line 1397 comes after a connection to the server has been established, so the original issue is not TLS related according to the call stack.

Based on the last call, it appears the agent was waiting for the heartbeat response but 0 bytes were returned:

File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.

 

Based on this, it appears that the agent received a non-avro response from the server.

Among some other things, this could be caused by:

(1)

 

The server not being Cloudera Manager.  Check to make sure the sever listening on port 7182 is actually CM.  You can use:
netstat -nap |grep 7182 on the CM host

 

(2)

 

Cloudera Manager failed the processing of the heartbeat.  Check the CM logs to see if there are any messages at the time that the agent is showing the exception.

 

/var/log/cloudera-scm-server/cloudera-scm-server.log

 

 

Hopefully one of those gives some more clues.

avatar
Explorer

Thanks bgooley for your answer.

 

To simplify the diagnostic, I just start an agent on the same server as the cloudera manager.

netstat return:

tcp 0 0 127.0.0.1:7182 127.0.0.1:41332 TIME_WAIT -
tcp 0 0 127.0.0.1:7182 127.0.0.1:41346 TIME_WAIT -

 

and the log:

[18/Feb/2019 10:44:40 +0000] 120320 MainThread agent ERROR Heartbeating to localhost:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1388, in _send_heartbeat
self.cfg.max_cert_depth)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
self.conn.connect()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 80, in connect
sock.connect((self.host, self.port))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 305, in connect
ret = self.connect_ssl()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 292, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert bad certificate

 

It's the last line I don't understand. The use of sslv3.

 

Regards

Alain

avatar
Explorer

Hello Experts,

 

I finally found and resolved the problem.

I destroyed all the VM and built new ones. I kept only the database VM on which I droped the database scm and re-created it (to empty it).

In our company, we use proxy to go on Internet, direct access is forbidden. So I re-wrote the repo files in /etc/yum.repos.d  to put   proxy=http://<proxy_user>:<proxy_password>@<proxy_url>:8080/    after each

repo definition. I canceled in /etc/yum.conf  all proxy definition.

The repos for cloudera-cm and cloudera-cdh  were written with the local repo I created and filled with the good packages. I ran  yum update to get rpg-key (for epel-release).

The only disadvantage with this method is : no parcels can be used (the proxy definition in the wizard crashed the download (timeout connexion)).

 

Thanks for your help.

Alain