Support Questions

Find answers, ask questions, and share your expertise

Failed to add host

Explorer

I'm trying to add a host to a single node cluster after configuring it with tls/ssl and kerberos but am unable to.  What should I check?

 

 Installation failed. Failed to receive heartbeat from agent.

  • Ensure that the host's hostname is configured properly.
  • Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
  • Ensure that ports 9000 and 9001 are not in use on the host being added.
  • Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
  • If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.

 

>>SSLError: unexpected eof
>>[25/May/2018 11:43:31 +0000] 11441 MainThread agent ERROR Heartbeating to 192.168.0.11:7182 failed.
>>Traceback (most recent call last):
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.3-py2.7.egg/cmf/agent.py", line 1424, in _send_heartbeat
>> self.max_cert_depth)
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.3-py2.7.egg/cmf/https.py", line 138, in __init__
>> self.conn.connect()
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/httpslib.py", line 59, in connect
>> sock.connect((self.host, self.port))
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 195, in connect
>> ret = self.connect_ssl()
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 188, in connect_ssl
>> return m2.ssl_connect(self.ssl, self._timeout)

>>SSLError: unexpected eof
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO Stopping agent...
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO No extant cgroups; unmounting any cgroup roots
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO 1 processes are being managed; Supervisor will continue to run.
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPING
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('phd-node1', 9000)) shut down
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Stopped thread '_TimeoutMonitor'.
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPED
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPING
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('phd-node1', 9000)) already shut down
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE No thread running for None.
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPED
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus EXITING
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus EXITED
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO Agent exiting; caught signal 15
>>[25/May/2018 11:43:32 +0000] 11441 Dummy-13 daemonize WARNING Stopping daemon.
END (0)
end of agent logs.
scm agent started
Installation script completed successfully.

16 REPLIES 16

Expert Contributor

You can check the last of the points you have listed as it is almost sure it will fail to hearbeat, if you have not manually did that step (set use_tls=1 and restart agent)

If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.

If you did this already, then make sure that the configured keystore and truststore files, have been copied to the new host.

Explorer

Hi Gekas, thanks for the reply. I copied the config.ini from the head node to the compute node to make sure all the config items are the same.  I've verified use_tls=1.  Anything else that might cause the additional node to not heartbeat?

Expert Contributor
Have you checked the trust store file that it is copied to the host?

Explorer

I copied $JAVA_HOME/jre/lib/security/jssecacerts from the head node to the compute node.  What should I check for?  How do I check the trust store?

Explorer
I am facing exact problem. How did you resolve it?
I would appreciate if you can help me with resolution.
Thanks

Super Guru

@xBigDatax,

 

(1)

 

First,please provide the information you used to assess you are seeing the exact problem.  Stack trace variances can have major implications for how we approach a problem such as this.

 

(2)

 

We need to know what you have set for the following in Cloudera Manager (checked or unchecked):

 

In Cloudera Manager (Administration --> Settings)

- Use TLS Encryption for Agents

- Use TLS Authentication of Agents to Server

 

(3)

 

We need to know what you have configured in the config.ini regarding security on the host that cannot heartbeat:

 

# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"

 

NEXT STEPS:

 

With the above information, we can determine if your agent and Cloudera Manager security settings align.  Depending on what we find, we may need to take further steps.

 

If you are using the Wizard to add a new host, you will need to manually copy over a config.ini from a "good" host and then create all the security files that are referenced if there are any.

 

 

Explorer

1) below error in both Edge nodes

[19/Sep/2018 19:01:07 +0000] 14599 MainThread agent ERROR Heartbeating to myucbpaabdapp03:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.3-py2.6.egg/cmf/agent.py", line 1424, in _send_heartbeat
self.max_cert_depth)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.3-py2.6.egg/cmf/https.py", line 138, in __init__
self.conn.connect()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/httpslib.py", line 50, in connect
self.sock.connect((self.host, self.port))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 185, in connect
ret = self.connect_ssl()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 178, in connect_ssl
return m2.ssl_connect(self.ssl)
SSLError: unexpected eof

 

2)

TLS Authentication of Agents to Server = Checked

Use TLS Encryption for Agents= Checked

 

3) [root@myucbpaabdapp25 security]# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"
use_tls=1
verify_cert_file=/opt/cloudera/security/x509/agents.pem

 

[root@myucbpaabdapp26 ~]# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"
use_tls=1
verify_cert_file=/opt/cloudera/security/x509/agents.pem
[root@myucbpaabdapp26 ~]#

 

Explorer

Please look for self signed certificate details

 

[root@myxxxxxxxxxxxx25 jks]# openssl s_client -connect myxxxxxxxxxxxx03:7182 -CAfile <(keytool -list -rfc -keystore /opt/cloudera/security/jks/cimbbda.truststore < /dev/null) < /dev/null
Enter keystore password:
***************** WARNING WARNING WARNING *****************
* The integrity of the information stored in your keystore *
* has NOT been verified! In order to verify its integrity, *
* you must provide your keystore password. *
***************** WARNING WARNING WARNING *****************

CONNECTED(00000003)
depth=0 C = , ST = , L = , O = , OU = , CN = myxxxxxxxxxxxx03
verify return:1
140385017821000:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:184:
---
Certificate chain
0 s:/C=/ST=/L=/O=/OU=/CN=myxxxxxxxxxxxx03
i:/C=/ST=/L=/O=/OU=/CN=myxxxxxxxxxxxx03
---
Server certificate

Super Guru

@xBigDatax,

 

If you have TLS Authentication of Agents to Server = Checked then you need to configure the following in the agent's config.ini:

 

client_key_file=

client_keypw_file=

client_cert_file=

 

With TLS Authentication enabled Cloudera Manager requires that the agent authenticate by sending its certificate.

Cloudera Manager then must trust the signer (if the agent's certificate has extended key usage including "TLS Web Client Authentication") via CM's truststore OR the agen't certificate must exist in the truststore.

 

If you have other agents that can heartbeat, they would have to have the described configuration in order to work, so you can mimic the other nodes.

Super Guru

Oh, and if you wanted to just get this working with minimal effort, disable "TLS Authentication of Agents to Server" and restart Cloudera Manager with service cloudera-scm-server restart.

The existing agent configuration has a configuration that supports TLS use and certificate verification on the agent side.

Explorer

I think previously. "TLS authentication for agent" was not checked and that's why edge node was connected to manager properly. 

I have disabled/unchecked again and restarted cloudera-scm-agent. Now cloudera agent log on edge node changed altogether. Seems more cleaned log except one error. I things are set now edge node host should visible to cloudera manager?

Error

 Monitor-HostMonitor throttling_logger ERROR    Could not find local file system for /var/run/cloudera-scm-agent/process

 

Detailed Error

[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib64/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin:/usr/kerberos/bin', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'KEYTRUSTEE_SERVER_HOME': '/usr/lib/keytrustee-server', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'ORACLE_HOME': '/usr/share/oracle/instantclient', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_KAFKA_HOME': '/usr/lib/kafka', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/process
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/flood
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor/include
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Supervisor version: 3.0, pid: 31071
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Connecting to previous supervisor: agent-31043-1537253117.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread status_server INFO Using maximum impala profile bundle size of 1073741824 bytes.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread status_server INFO Using maximum stacks log bundle size of 1073741824 bytes.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Bus STARTING
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Started monitor thread '_TimeoutMonitor'.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Serving on myucbpaabdapp25.cimbmy.cimbdomain.com:9000
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Bus STARTED
[20/Sep/2018 13:29:42 +0000] 77125 MainThread __init__ INFO New monitor: (<cmf.monitor.host.HostMonitor object at 0x3919050>,)
[20/Sep/2018 13:29:42 +0000] 77125 MonitorDaemon-Scheduler __init__ INFO Monitor ready to report: ('HostMonitor',)
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Setting default socket timeout to 45
[20/Sep/2018 13:29:42 +0000] 77125 Monitor-HostMonitor network_interfaces INFO NIC iface eth0 doesn't support ETHTOOL (95)
[20/Sep/2018 13:29:42 +0000] 77125 Monitor-HostMonitor throttling_logger ERROR Could not find local file system for /var/run/cloudera-scm-agent/process
[20/Sep/2018 13:29:42 +0000] 77125 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.10 min:0.10 mean:0.10 max:0.10 LIFE_MAX:0.10
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO CM server guid: e157e5cc-09e9-4196-bac0-d396d5c1a920
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Using parcels directory from server provided value: /opt/cloudera/parcels
[20/Sep/2018 13:29:42 +0000] 77125 MainThread parcel INFO Agent does create users/groups and apply file permissions
[20/Sep/2018 13:29:42 +0000] 77125 MainThread downloader INFO Downloader path: /opt/cloudera/parcel-cache
[20/Sep/2018 13:29:42 +0000] 77125 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Flood daemon (re)start attempt
[20/Sep/2018 13:29:43 +0000] 77125 MainThread agent INFO Triggering supervisord update.
[20/Sep/2018 13:29:44 +0000] 77125 MainThread firehoses INFO Reporting interval updated: 5.0 -> 60
[20/Sep/2018 13:29:44 +0000] 77125 MainThread agent INFO Active parcel list updated; recalculating component info.
[20/Sep/2018 13:29:44 +0000] 77125 MainThread throttling_logger INFO Identified java component java8 with full version JAVA_HOME=/usr/java/default java version "1.8.0_171" Java(TM) SE Runtime Environment (build 1.8.0_171-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) for requested version .
[20/Sep/2018 13:30:42 +0000] 77125 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
[20/Sep/2018 13:30:42 +0000] 77125 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[20/Sep/2018 13:30:42 +0000] 77125 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.

Super Guru

@xBigDatax,

 

That error doesn't indicate anything fatal... Chec the Hosts tab of Cloudera Manager to view all hosts.  If you see heartbeats within the last 15 seconds, all should be well with the agent communication.

 

That said, I haven't seen this in a while, but check your /etc/cloudera-scm-agent/config.ini and make sure you have this set:

 

monitored_nodev_filesystem_types=nfs,nfs4,tmpfs

 

In older config.ini files it wasn't set which resulted in seeing the Could not find local file system for /var/run/cloudera-scm-agent/process error.  If you edit config.ini, make sure to restart the agent with "service cloudera-scm-agent restart.

 

Also run "df" to make sure you see the dir mounted and that it exists.:

For example:

cm_processes    13404764   142184  13262580   2% /var/run/cloudera-scm-agent/process

 

Explorer

I have updated config.ini file and restarted agent. However same error. Nothing new error.

 Could not find local file system for /var/run/cloudera-scm-agent/process

 

Checked using df -h its there. I can see its mounted. Also hosttab of CM still not reflect the edge nodes.

 

Super Guru

@xBigDatax,

 

Let's have a look at your config.ini and "mount -l"

 

# grep -v "^#" /etc/cloudera-scm-agent/config.ini | grep -v "^$"

# mount -l

 

 

Explorer

Here is output. Wondering if we can connect over webex? [root@myumyhost25 cloudera-scm-agent]# grep -v "^#" /etc/cloudera-scm-agent/config.ini | grep -v "^$" [General] server_host=myumyhost03 server_port=7182 [Security] use_tls=1 verify_cert_file=/opt/cloudera/security/x509/agents.pem monitored_nodev_filesystem_types=nfs,nfs4,tmpfs [root@myumyhost25 cloudera-scm-agent]# mount -l /dev/mapper/vg_myumyhost-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/xvdb1 on /boot type ext4 (rw) /dev/mapper/vg_data-lv_data on /data type ext4 (rw) /dev/mapper/vg_data-lv_home on /home type ext4 (rw) /dev/mapper/vg_data-lv_opt on /opt type ext4 (rw) /dev/mapper/vg_data-lv_var on /var type ext4 (rw) /dev/mapper/vg_myumyhost-lv_var_crash on /var/crash type ext4 (rw) /dev/mapper/vg_myumyhost-lv_var_log on /var/log type ext4 (rw) /iso/OEL6.9/V860937-01.iso on /var/OSimage/OL6.9_x86_64 type iso9660 (ro,loop=/dev/loop0) [OL6.9 x86_64 Disc 1 20170324] none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/blkio type cgroup (rw,blkio) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/cpuacct type cgroup (rw,cpuacct) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/cpu type cgroup (rw,cpu) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/memory type cgroup (rw,memory) cm_processes on /var/run/cloudera-scm-agent/process type tmpfs (rw,mode=0751,rootcontext="unconfined_u:object_r:var_run_t:s0") [root@myumyhost25 cloudera-scm-agent]#

Explorer

Tell me about your environment.  I'll try to help as best I can. 

 

Some of the issues I came across while setting up the cluster was using a wildcard cert instead of single certs for each node and I had to setup a DNS server on the head node.