I'm trying to add a host to a single node cluster after configuring it with tls/ssl and kerberos but am unable to. What should I check?
Installation failed. Failed to receive heartbeat from agent.
>>SSLError: unexpected eof
>>[25/May/2018 11:43:31 +0000] 11441 MainThread agent ERROR Heartbeating to 192.168.0.11:7182 failed.
>>Traceback (most recent call last):
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.3-py2.7.egg/cmf/agent.py", line 1424, in _send_heartbeat
>> self.max_cert_depth)
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.3-py2.7.egg/cmf/https.py", line 138, in __init__
>> self.conn.connect()
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/httpslib.py", line 59, in connect
>> sock.connect((self.host, self.port))
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 195, in connect
>> ret = self.connect_ssl()
>> File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 188, in connect_ssl
>> return m2.ssl_connect(self.ssl, self._timeout)
>>SSLError: unexpected eof
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO Stopping agent...
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO No extant cgroups; unmounting any cgroup roots
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO 1 processes are being managed; Supervisor will continue to run.
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPING
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('phd-node1', 9000)) shut down
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Stopped thread '_TimeoutMonitor'.
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPED
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPING
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('phd-node1', 9000)) already shut down
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE No thread running for None.
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus STOPPED
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus EXITING
>>[25/May/2018 11:43:32 +0000] 11441 MainThread _cplogging INFO [25/May/2018:11:43:32] ENGINE Bus EXITED
>>[25/May/2018 11:43:32 +0000] 11441 MainThread agent INFO Agent exiting; caught signal 15
>>[25/May/2018 11:43:32 +0000] 11441 Dummy-13 daemonize WARNING Stopping daemon.
END (0)
end of agent logs.
scm agent started
Installation script completed successfully.
Created 05-28-2018 01:39 AM
You can check the last of the points you have listed as it is almost sure it will fail to hearbeat, if you have not manually did that step (set use_tls=1 and restart agent)
If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.
If you did this already, then make sure that the configured keystore and truststore files, have been copied to the new host.
Created 05-29-2018 09:30 AM
Hi Gekas, thanks for the reply. I copied the config.ini from the head node to the compute node to make sure all the config items are the same. I've verified use_tls=1. Anything else that might cause the additional node to not heartbeat?
Created 05-29-2018 10:01 AM
Created 05-29-2018 10:40 AM
I copied $JAVA_HOME/jre/lib/security/jssecacerts from the head node to the compute node. What should I check for? How do I check the trust store?
Created 09-18-2018 06:10 PM
Created 09-19-2018 10:41 AM
(1)
First,please provide the information you used to assess you are seeing the exact problem. Stack trace variances can have major implications for how we approach a problem such as this.
(2)
We need to know what you have set for the following in Cloudera Manager (checked or unchecked):
In Cloudera Manager (Administration --> Settings)
- Use TLS Encryption for Agents
- Use TLS Authentication of Agents to Server
(3)
We need to know what you have configured in the config.ini regarding security on the host that cannot heartbeat:
# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"
NEXT STEPS:
With the above information, we can determine if your agent and Cloudera Manager security settings align. Depending on what we find, we may need to take further steps.
If you are using the Wizard to add a new host, you will need to manually copy over a config.ini from a "good" host and then create all the security files that are referenced if there are any.
Created 09-19-2018 06:36 PM
1) below error in both Edge nodes
[19/Sep/2018 19:01:07 +0000] 14599 MainThread agent ERROR Heartbeating to myucbpaabdapp03:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.3-py2.6.egg/cmf/agent.py", line 1424, in _send_heartbeat
self.max_cert_depth)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.14.3-py2.6.egg/cmf/https.py", line 138, in __init__
self.conn.connect()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/httpslib.py", line 50, in connect
self.sock.connect((self.host, self.port))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 185, in connect
ret = self.connect_ssl()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/M2Crypto-0.21.1-py2.6-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 178, in connect_ssl
return m2.ssl_connect(self.ssl)
SSLError: unexpected eof
2)
TLS Authentication of Agents to Server = Checked
Use TLS Encryption for Agents= Checked
3) [root@myucbpaabdapp25 security]# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"
use_tls=1
verify_cert_file=/opt/cloudera/security/x509/agents.pem
[root@myucbpaabdapp26 ~]# egrep '(cert|key|tls)' /etc/cloudera-scm-agent/config.ini |grep -v "^#"
use_tls=1
verify_cert_file=/opt/cloudera/security/x509/agents.pem
[root@myucbpaabdapp26 ~]#
Created 09-19-2018 07:15 PM
Please look for self signed certificate details
[root@myxxxxxxxxxxxx25 jks]# openssl s_client -connect myxxxxxxxxxxxx03:7182 -CAfile <(keytool -list -rfc -keystore /opt/cloudera/security/jks/cimbbda.truststore < /dev/null) < /dev/null
Enter keystore password:
***************** WARNING WARNING WARNING *****************
* The integrity of the information stored in your keystore *
* has NOT been verified! In order to verify its integrity, *
* you must provide your keystore password. *
***************** WARNING WARNING WARNING *****************
CONNECTED(00000003)
depth=0 C = , ST = , L = , O = , OU = , CN = myxxxxxxxxxxxx03
verify return:1
140385017821000:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:184:
---
Certificate chain
0 s:/C=/ST=/L=/O=/OU=/CN=myxxxxxxxxxxxx03
i:/C=/ST=/L=/O=/OU=/CN=myxxxxxxxxxxxx03
---
Server certificate
Created 09-19-2018 09:56 PM
If you have TLS Authentication of Agents to Server = Checked then you need to configure the following in the agent's config.ini:
client_key_file=
client_keypw_file=
client_cert_file=
With TLS Authentication enabled Cloudera Manager requires that the agent authenticate by sending its certificate.
Cloudera Manager then must trust the signer (if the agent's certificate has extended key usage including "TLS Web Client Authentication") via CM's truststore OR the agen't certificate must exist in the truststore.
If you have other agents that can heartbeat, they would have to have the described configuration in order to work, so you can mimic the other nodes.
Created 09-19-2018 09:58 PM
Oh, and if you wanted to just get this working with minimal effort, disable "TLS Authentication of Agents to Server" and restart Cloudera Manager with service cloudera-scm-server restart.
The existing agent configuration has a configuration that supports TLS use and certificate verification on the agent side.
Created 09-19-2018 10:40 PM
I think previously. "TLS authentication for agent" was not checked and that's why edge node was connected to manager properly.
I have disabled/unchecked again and restarted cloudera-scm-agent. Now cloudera agent log on edge node changed altogether. Seems more cleaned log except one error. I things are set now edge node host should visible to cloudera manager?
Error
Monitor-HostMonitor throttling_logger ERROR Could not find local file system for /var/run/cloudera-scm-agent/process
Detailed Error
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib64/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin:/usr/kerberos/bin', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'KEYTRUSTEE_SERVER_HOME': '/usr/lib/keytrustee-server', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'ORACLE_HOME': '/usr/share/oracle/instantclient', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_KAFKA_HOME': '/usr/lib/kafka', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/process
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/flood
[20/Sep/2018 13:29:41 +0000] 77125 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor/include
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Supervisor version: 3.0, pid: 31071
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Connecting to previous supervisor: agent-31043-1537253117.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread status_server INFO Using maximum impala profile bundle size of 1073741824 bytes.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread status_server INFO Using maximum stacks log bundle size of 1073741824 bytes.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Bus STARTING
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Started monitor thread '_TimeoutMonitor'.
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Serving on myucbpaabdapp25.cimbmy.cimbdomain.com:9000
[20/Sep/2018 13:29:42 +0000] 77125 MainThread _cplogging INFO [20/Sep/2018:13:29:42] ENGINE Bus STARTED
[20/Sep/2018 13:29:42 +0000] 77125 MainThread __init__ INFO New monitor: (<cmf.monitor.host.HostMonitor object at 0x3919050>,)
[20/Sep/2018 13:29:42 +0000] 77125 MonitorDaemon-Scheduler __init__ INFO Monitor ready to report: ('HostMonitor',)
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Setting default socket timeout to 45
[20/Sep/2018 13:29:42 +0000] 77125 Monitor-HostMonitor network_interfaces INFO NIC iface eth0 doesn't support ETHTOOL (95)
[20/Sep/2018 13:29:42 +0000] 77125 Monitor-HostMonitor throttling_logger ERROR Could not find local file system for /var/run/cloudera-scm-agent/process
[20/Sep/2018 13:29:42 +0000] 77125 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.10 min:0.10 mean:0.10 max:0.10 LIFE_MAX:0.10
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO CM server guid: e157e5cc-09e9-4196-bac0-d396d5c1a920
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Using parcels directory from server provided value: /opt/cloudera/parcels
[20/Sep/2018 13:29:42 +0000] 77125 MainThread parcel INFO Agent does create users/groups and apply file permissions
[20/Sep/2018 13:29:42 +0000] 77125 MainThread downloader INFO Downloader path: /opt/cloudera/parcel-cache
[20/Sep/2018 13:29:42 +0000] 77125 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[20/Sep/2018 13:29:42 +0000] 77125 MainThread agent INFO Flood daemon (re)start attempt
[20/Sep/2018 13:29:43 +0000] 77125 MainThread agent INFO Triggering supervisord update.
[20/Sep/2018 13:29:44 +0000] 77125 MainThread firehoses INFO Reporting interval updated: 5.0 -> 60
[20/Sep/2018 13:29:44 +0000] 77125 MainThread agent INFO Active parcel list updated; recalculating component info.
[20/Sep/2018 13:29:44 +0000] 77125 MainThread throttling_logger INFO Identified java component java8 with full version JAVA_HOME=/usr/java/default java version "1.8.0_171" Java(TM) SE Runtime Environment (build 1.8.0_171-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) for requested version .
[20/Sep/2018 13:30:42 +0000] 77125 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
[20/Sep/2018 13:30:42 +0000] 77125 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[20/Sep/2018 13:30:42 +0000] 77125 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
Created 09-20-2018 12:27 AM
That error doesn't indicate anything fatal... Chec the Hosts tab of Cloudera Manager to view all hosts. If you see heartbeats within the last 15 seconds, all should be well with the agent communication.
That said, I haven't seen this in a while, but check your /etc/cloudera-scm-agent/config.ini and make sure you have this set:
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
In older config.ini files it wasn't set which resulted in seeing the Could not find local file system for /var/run/cloudera-scm-agent/process error. If you edit config.ini, make sure to restart the agent with "service cloudera-scm-agent restart.
Also run "df" to make sure you see the dir mounted and that it exists.:
For example:
cm_processes 13404764 142184 13262580 2% /var/run/cloudera-scm-agent/process
Created 09-20-2018 12:58 AM
I have updated config.ini file and restarted agent. However same error. Nothing new error.
Could not find local file system for /var/run/cloudera-scm-agent/process
Checked using df -h its there. I can see its mounted. Also hosttab of CM still not reflect the edge nodes.
Created 09-20-2018 10:02 AM
Let's have a look at your config.ini and "mount -l"
# grep -v "^#" /etc/cloudera-scm-agent/config.ini | grep -v "^$"
# mount -l
Created on 09-20-2018 10:26 PM - edited 09-20-2018 11:43 PM
Here is output. Wondering if we can connect over webex? [root@myumyhost25 cloudera-scm-agent]# grep -v "^#" /etc/cloudera-scm-agent/config.ini | grep -v "^$" [General] server_host=myumyhost03 server_port=7182 [Security] use_tls=1 verify_cert_file=/opt/cloudera/security/x509/agents.pem monitored_nodev_filesystem_types=nfs,nfs4,tmpfs [root@myumyhost25 cloudera-scm-agent]# mount -l /dev/mapper/vg_myumyhost-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/xvdb1 on /boot type ext4 (rw) /dev/mapper/vg_data-lv_data on /data type ext4 (rw) /dev/mapper/vg_data-lv_home on /home type ext4 (rw) /dev/mapper/vg_data-lv_opt on /opt type ext4 (rw) /dev/mapper/vg_data-lv_var on /var type ext4 (rw) /dev/mapper/vg_myumyhost-lv_var_crash on /var/crash type ext4 (rw) /dev/mapper/vg_myumyhost-lv_var_log on /var/log type ext4 (rw) /iso/OEL6.9/V860937-01.iso on /var/OSimage/OL6.9_x86_64 type iso9660 (ro,loop=/dev/loop0) [OL6.9 x86_64 Disc 1 20170324] none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/blkio type cgroup (rw,blkio) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/cpuacct type cgroup (rw,cpuacct) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/cpu type cgroup (rw,cpu) cm_cgroups on /var/run/cloudera-scm-agent/cgroups/memory type cgroup (rw,memory) cm_processes on /var/run/cloudera-scm-agent/process type tmpfs (rw,mode=0751,rootcontext="unconfined_u:object_r:var_run_t:s0") [root@myumyhost25 cloudera-scm-agent]#
Created 09-19-2018 11:07 AM
Tell me about your environment. I'll try to help as best I can.
Some of the issues I came across while setting up the cluster was using a wildcard cert instead of single certs for each node and I had to setup a DNS server on the head node.