Support Questions

Find answers, ask questions, and share your expertise

cloudera agent not able to send heartbeats to cloudera manager

avatar
Contributor

I have upgraded cloudera CDH 5.9 to CDH 5.14.1. Before upgrade Edge node was working fine. After upgerading CDH cluster, I have upgraded cloudera manager agent, daemon, and jdk with same version on CDH cluster. But my Edge node is failing and not able to talk to Cloudera manager. Below error I can see in log.. Please help

 

 

[18/Sep/2018 16:41:07 +0000] 45790 MainThread agent ERROR Heartbeating to myucbpaabdapp03:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.3-py2.6.egg/cmf/agent.py", line 1424, in _send_heartbeat
self.max_cert_depth)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.3-py2.6.egg/cmf/https.py", line 138, in __init__
self.conn.connect()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/httpslib.py", line 50, in connect
self.sock.connect((self.host, self.port))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 185, in connect
ret = self.connect_ssl()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 178, in connect_ssl
return m2.ssl_connect(self.ssl)
SSLError: unexpected eof

 

9 REPLIES 9

avatar
Expert Contributor
It looks like a problem with your SSL/TLS configuration. Check that the trust store contains the correct certificates. When you upgraded the JDK did did you remember to add your certificates to the cacerts truststore?

Regards,
Jim

avatar
Contributor
I have installed agent and jdk from scrath to another edge node and created self signed certificate and imported into trust store. However exact error reproduced in another node agent. I tried to verify further and its showing ssl handshake failed.
[root@myucbpaabdapp25 yum.repos.d]# openssl s_client -connect myucbpaabdapp03:7182 -CAfile /opt/cloudera/security/jks/bda.truststore
CONNECTED(00000003)
depth=0 C = , ST = , L = , O = , OU = , CN = myucbpaabdapp03
verify error:num=18:self signed certificate
verify return:1
depth=0 C = , ST = , L = , O = , OU = , CN = myucbpaabdapp03
verify return:1
140215717300040:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:184:
---
Certificate chain
0 s:/C=/ST=/L=/O=/OU=/CN=myucbpaabdapp03
i:/C=/ST=/L=/O=/OU=/CN=

avatar
Master Guru

@xBigDatax

 

Openssl does not use JKS files so if bda.truststore is a JKS file (not PEM) then it won't work for trust.  You could do something like this:

 

 

# openssl s_client -connect myucbpaabdapp03:7182 -CAfile <(keytool -list -rfc -keystore opt/cloudera/security/jks/bda.truststore < /dev/null) < /dev/null

 

Note that your certificate does not appear to be self-signed. Rather, it has a null issuer:

 

0 s:/C=/ST=/L=/O=/OU=/CN=myucbpaabdapp03
i:/C=/ST=/L=/O=/OU=/CN=

 

If this were self signed, the subject an issuer should match like:

 

0 s:/C=/ST=/L=/O=/OU=/CN=myucbpaabdapp03
i:/C=/ST=/L=/O=/OU=/CN=myucbpaabdapp03

 

Also note that you did not add in the Fully-qualified DN which will cause problems for TLS and Kerberos.  In general, CM/CDH expects fully-qualified domain names so this could run you into trouble later.

 

Lastly, assuming that this once worked as it was configured, we need to know more about your agent communication TLS config to be able to help:

 

(1)

We need to know what you have set for the following in Cloudera Manager (checked or unchecked):

In Cloudera Manager (Administration --> Settings)

- Use TLS Encryption for Agents
- Use TLS Authentication of Agents to Server

(2)

We need to know what you have configured in the config.ini regarding security on the host that cannot heartbeat:

# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"

 

(3)

 

Verify that all of the configuration items match a working node (one that can heartbeat)

 

 

avatar
Contributor

I have marked unchecked to Use TLS Authentication of Agents to Server

 

and restarted agent on edge node server. However not able to see on CM unders host tab.

 

This time I have more cleaned cloudera-scm-agent log but with one error

 

Monitor-HostMonitor throttling_logger ERROR    Could not find local file system for /var/run/cloudera-scm-agent/process

 

complete log.

[20/Sep/2018 18:18:40 +0000] 123357 MainThread tmpfs INFO Successfully umounted tmpfs at /var/run/cloudera-scm-agent/process
[20/Sep/2018 18:18:40 +0000] 123357 MainThread tmpfs INFO Successfully mounted tmpfs at /var/run/cloudera-scm-agent/process
[20/Sep/2018 18:18:41 +0000] 123357 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 1)
[20/Sep/2018 18:18:41 +0000] 123357 MainThread agent INFO Supervisor version: 3.0, pid: 123382
[20/Sep/2018 18:18:41 +0000] 123357 MainThread agent INFO Successfully connected to supervisor
[20/Sep/2018 18:18:41 +0000] 123357 MainThread status_server INFO Using maximum impala profile bundle size of 1073741824 bytes.
[20/Sep/2018 18:18:41 +0000] 123357 MainThread status_server INFO Using maximum stacks log bundle size of 1073741824 bytes.
[20/Sep/2018 18:18:41 +0000] 123357 MainThread _cplogging INFO [20/Sep/2018:18:18:41] ENGINE Bus STARTING
[20/Sep/2018 18:18:41 +0000] 123357 MainThread _cplogging INFO [20/Sep/2018:18:18:41] ENGINE Started monitor thread '_TimeoutMonitor'.
[20/Sep/2018 18:18:42 +0000] 123357 MainThread _cplogging INFO [20/Sep/2018:18:18:42] ENGINE Serving on myucbpaabdapp25.cimbmy.cimbdomain.com:9000
[20/Sep/2018 18:18:42 +0000] 123357 MainThread _cplogging INFO [20/Sep/2018:18:18:42] ENGINE Bus STARTED
[20/Sep/2018 18:18:42 +0000] 123357 MainThread __init__ INFO New monitor: (<cmf.monitor.host.HostMonitor object at 0x3edb090>,)
[20/Sep/2018 18:18:42 +0000] 123357 MonitorDaemon-Scheduler __init__ INFO Monitor ready to report: ('HostMonitor',)
[20/Sep/2018 18:18:42 +0000] 123357 MainThread agent INFO Setting default socket timeout to 45
[20/Sep/2018 18:18:42 +0000] 123357 Monitor-HostMonitor network_interfaces INFO NIC iface eth0 doesn't support ETHTOOL (95)
[20/Sep/2018 18:18:42 +0000] 123357 Monitor-HostMonitor throttling_logger ERROR Could not find local file system for /var/run/cloudera-scm-agent/process
[20/Sep/2018 18:18:42 +0000] 123357 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.10 min:0.10 mean:0.10 max:0.10 LIFE_MAX:0.10
[20/Sep/2018 18:18:42 +0000] 123357 MainThread agent INFO CM server guid: e157e5cc-09e9-4196-bac0-d396d5c1a920
[20/Sep/2018 18:18:42 +0000] 123357 MainThread agent INFO Using parcels directory from server provided value: /opt/cloudera/parcels
[20/Sep/2018 18:18:42 +0000] 123357 MainThread parcel INFO Agent does create users/groups and apply file permissions
[20/Sep/2018 18:18:42 +0000] 123357 MainThread downloader INFO Downloader path: /opt/cloudera/parcel-cache
[20/Sep/2018 18:18:42 +0000] 123357 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[20/Sep/2018 18:18:42 +0000] 123357 MainThread agent INFO Flood daemon (re)start attempt
[20/Sep/2018 18:18:42 +0000] 123357 MainThread agent INFO Triggering supervisord update.
[20/Sep/2018 18:18:42 +0000] 123357 MainThread downloader ERROR Failed rack peer update: [Errno 111] Connection refused
[20/Sep/2018 18:18:42 +0000] 123357 MainThread firehoses INFO Reporting interval updated: 5.0 -> 60
[20/Sep/2018 18:18:42 +0000] 123357 MainThread agent INFO Active parcel list updated; recalculating component info.
[20/Sep/2018 18:18:42 +0000] 123357 MainThread throttling_logger INFO Identified java component java8 with full version JAVA_HOME=/usr/java/default java version "1.8.0_171" Java(TM) SE Runtime Environment (build 1.8.0_171-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) for requested version .
[20/Sep/2018 18:19:42 +0000] 123357 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
[20/Sep/2018 18:19:42 +0000] 123357 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[20/Sep/2018 18:19:42 +0000] 123357 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[20/Sep/2018 18:28:44 +0000] 123357 MainThread heartbeat_tracker INFO HB stats (seconds): num:42 LIFE_MIN:0.06 min:0.02 mean:0.06 max:0.09 LIFE_MAX:0.10

avatar
Master Guru

@xBigDatax,

 

Sadly my previous didn't get published for some reason, so I'll repeat in short.

The following is not fatal and is not sign of a real problem:

 

[20/Sep/2018 18:18:42 +0000] 123357 Monitor-HostMonitor throttling_logger ERROR Could not find local file system for /var/run/cloudera-scm-agent/process

 

Notice that the "Monitor-HostMonitor" thread is what returns the error.  Your agent appears to be in good health and it should be heartbeating.

 

If /var/run/cloudera-scm-agent/process exists and shows up when running the "df" command, there is likely no big problem:

 

cm_processes     8133940    48444   8085496   1% /run/cloudera-scm-agent/process

 

I remember seeing that error years ago and it was caused by a mismatch between a Modern agent code base and an older /etc/cloudera-scm-agent/config.ini.

 

Check to make sure you see this in your agent's config.ini:

 

monitored_nodev_filesystem_types=nfs,nfs4,tmpfs

 

If it is not, then add it and restart the agent with "service cloudera-scm-agent restart"

 

The monitored_nodev_filesystem_types option lists the file systems that are considered 'nodev'.  If "tmpfs" is not listed there, the file system will be considered 'local'.  As we can see, cm_process is 'tmpfs':

 

cm_processes on /run/cloudera-scm-agent/process type tmpfs (rw,relatime,mode=751)

 

The config.ini configuration I mentioned should included tmpfs so certain evaluation is skipped (and you won't get that ERROR message).

 

 

avatar
Contributor

@bgooley

After applying the changes under general section of config.ini. Now no error. Only problem is for edge node cloudera version showing none which is hold me to apply any role on edge node. Please suggest

monitored_nodev_filesystem_types=nfs,nfs4,tmpfs

avatar
Master Guru

@xBigDatax,

 

I'm sorry but can't quite understand what you are saying/asking.

 

Can you clarify what you mean by:

"edge node cloudera version showing none which is hold me to apply any role on edge node"

 

Any agent should have 'tmpfs' listed in monitored_nodev_filesystem_types.

This should be in your config.ini unless you have a reason to remove it... I don't know of one.

monitored_nodev_filesystem_types=nfs,nfs4,tmpfs

avatar
Contributor

@bgooley 

 

Edge node integration with CM fixed. I can not see Edge Node hosts are visiable under "All hosts" of Cloudera Manager. 

When I clicked on Edge node host within Cloudera Manager, I can see heartbeats and everything recorded properly but  "CDH version is none"

Now I have to apply gateway roles on Edge nodes so that Hadoop Clients/Cluster configuration files can be deployed  properly. When I am trying to apply role, its failing and error is Mismatched CDH versions. Since Edge node host has CDH version status None where as CM version is CDH 5.14.3, causing issue. 

So question is how do I fix CDH version issue on Edge Node.?

avatar
Contributor

Issue fixed thanks for your continious support @bgooley Appreciate!