Created 12-27-2018 09:09 PM
Version: Cloudera Express 5.15.0
Java VM Name: Java HotSpot(TM) 64-Bit Server VM
Java VM Vendor: Oracle Corporation
Java Version: 1.7.0_67
System details:
Linux optim-rhel72-uppu.development.unicomglobal.software 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
It is a single host and I am using self signed certificate. I am just validating a POC with one of my product and hence not yet licensed.
Followed the steps mentioned at this link:
https://www.cloudera.com/documentation/enterprise/5-11-x/topics/how_to_configure_cm_tls.html
https://www.cloudera.com/documentation/enterprise/5-15-x/topics/sg_self_signed_tls.html
After enabling TLS, cloudera agant heartbeat is failing with the below lines in the cloudera-scm-agent.log
[27/Dec/2018 20:58:28 +0000] 6869 MainThread agent ERROR Heartbeating to optim-rhel72-uppu.development.unicomglobal.software:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/agent.py", line 1424, in _send_heartbeat
self.max_cert_depth)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 138, in __init__
self.conn.connect()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/httpslib.py", line 59, in connect
sock.connect((self.host, self.port))
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 195, in connect
ret = self.connect_ssl()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/SSL/Connection.py", line 188, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
SSLError: unexpected eof
Below lines in the cloudera-scm-server.log
2018-12-27 20:58:13,025 WARN 1320793343@agentServer-16:org.mortbay.log: javax.net.ssl.SSLHandshakeException: null cert chain
2018-12-27 20:58:28,034 WARN 1320793343@agentServer-16:org.mortbay.log: javax.net.ssl.SSLHandshakeException: null cert chain
2018-12-27 20:58:43,447 WARN 1320793343@agentServer-16:org.mortbay.log: javax.net.ssl.SSLHandshakeException: null cert chain
2018-12-27 20:58:58,082 WARN 1320793343@agentServer-16:org.mortbay.log: javax.net.ssl.SSLHandshakeException: null cert chain
2018-12-27 20:59:13,140 WARN 1320793343@agentServer-16:org.mortbay.log: javax.net.ssl.SSLHandshakeException: null cert chain
I have tried multiple times but none of them working.
I didn't find any error while running this command:
openssl s_client -showcerts -connect optim-rhel72-uppu.development.unicomglobal.software:7182
Any help would be highly appreciated.
Thanks,
Tulasi
Created 01-16-2019 01:42 PM
Thank you for providing your config. It appears you have space characters at the beginning of your cert/key configs. Remove the space characters form the beginning of the following lines and then restart the agent:
verify_cert_file=/opt/cloudera/security/pki/optim-rhel72-uppu.pem
verify_cert_dir=/opt/cloudera/security/pki
client_key_file=/opt/cloudera/security/pki/agent.key
client_keypw_file=/etc/cloudera-scm-agent/agentkey.pw
client_cert_file=/opt/cloudera/security/pki/agent.pem
Created 01-14-2019 01:19 AM
Created 01-14-2019 09:24 PM
Hi,
Here is my response to your questions, can you please correct me what I am doing wrong. Also if you need some more details, I should be able to share.
Thanks,
Tulasi
1.) Ensure that the certificates are in a standard x509 format for the agent.
Yes it is standard x509, see my response to Bgooley
2.) Ensure that the truststores/keystores on the CM host are in JCEKS format and not pkcs12.
As per cloudera document, it should be JCEKS. From the link
https://www.cloudera.com/documentation/enterprise/5-15-x/topics/how_to_configure_cm_tls.html
section "Generate TLS Certificate", point 3
3.) Make sure that the cloudera-scm user can read the Private Key, Certificates, Truststores, and Password Files.
Yes, see my response to gzigldrum
4.) Make sure that the certificate on the failing agent contains a proper CN and DNS Alt Name if Alt Names are in use.
Yes, I have verified this as well
5.) Are you using self-signed certificates or certificates signed by a CA?
I am using self signed certificate
6.) If all else fails you can obtain a tcpdump of attempted communication with the server. The port that we normally heartbeat to is 7182. You can then review the conversation between the server and agent to attempt to identify at what point the error is returned and potentially what error is being observed at the protocol level. You can identify and restrict your tcpdump information by tcp.stream.
[root@optim-rhel72-uppu ~]# tcpdump -i any 'port 7182'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
21:20:03.562131 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [S], seq 3560294529, win 43690, options [mss 65495,sackOK,TS val 1632415805 ecr 0,nop,wscale 7], length 0
21:20:03.562225 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [.], ack 1, win 342, options [nop,nop,TS val 1632415805 ecr 1632415805], length 0
21:20:03.562549 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [P.], seq 1:254, ack 1, win 342, options [nop,nop,TS val 1632415806 ecr 1632415805], length 253
21:20:03.587871 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [.], ack 16390, win 1365, options [nop,nop,TS val 1632415831 ecr 1632415831], length 0
21:20:03.587919 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [.], ack 21184, win 2388, options [nop,nop,TS val 1632415831 ecr 1632415831], length 0
21:20:03.619895 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [P.], seq 254:516, ack 21184, win 2388, options [nop,nop,TS val 1632415863 ecr 1632415831], length 262
21:20:03.628945 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [F.], seq 516, ack 21185, win 2388, options [nop,nop,TS val 1632415872 ecr 1632415864], length 0
Created 01-16-2019 03:18 AM
In addition to that, can you please show us the CM agent configuration with
# egrep -v '^[[:blank:]]*#|^$' /etc/cloudera-scm-agent/config.ini
Created 01-16-2019 08:33 AM
Here is the output:
[root@optim-rhel72-uppu ~]# egrep -v '^[[:blank:]]*#|^$' /etc/cloudera-scm-agent/config.ini
[General]
server_host=optim-rhel72-uppu.development.unicomglobal.software
server_port=7182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=INFO
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
verify_cert_file=/opt/cloudera/security/pki/optim-rhel72-uppu.pem
verify_cert_dir=/opt/cloudera/security/pki
client_key_file=/opt/cloudera/security/pki/agent.key
client_keypw_file=/etc/cloudera-scm-agent/agentkey.pw
client_cert_file=/opt/cloudera/security/pki/agent.pem
[Hadoop]
[Cloudera]
[JDBC]
[root@optim-rhel72-uppu ~]#
Created 01-16-2019 01:42 PM
Thank you for providing your config. It appears you have space characters at the beginning of your cert/key configs. Remove the space characters form the beginning of the following lines and then restart the agent:
verify_cert_file=/opt/cloudera/security/pki/optim-rhel72-uppu.pem
verify_cert_dir=/opt/cloudera/security/pki
client_key_file=/opt/cloudera/security/pki/agent.key
client_keypw_file=/etc/cloudera-scm-agent/agentkey.pw
client_cert_file=/opt/cloudera/security/pki/agent.pem
Created 01-16-2019 01:46 PM
NOTE:
You should only specify verify_cert_dir OR verify_cert_file, not both
Since you have a pem file, I would suggest using verify_cert_file and commenting out "verify_cert_dir=/opt/cloudera/security/pki"
Created 01-17-2019 08:22 PM
space characters at the bginning of cert/key in the agent configuration file is created this problem. After removing all of those spaces, restarted agent worked.
I didn't expect a space can create this sort of problem without telling what is going wrong.
Thanks for helping to figure this silly problem.
Created 01-18-2019 02:25 AM
I have followed the steps under "Configuring TLS/SSL for HDFS, YARN and MapReduce"
Service did not start successfully; not all of the required roles started: only 0/1 roles started. Reasons : Service has only 0 NodeManager roles running instead of minimum required 1
YARN failing to start and I see below error in the log
Can't open /run/cloudera-scm-agent/process/190-yarn-NODEMANAGER/container-executor.cfg: Permission denied
This is the permission:
-rw-r----- 1 yarn hadoop 997 Jan 18 02:22 creds.localjceks
-rw------- 1 yarn hadoop 1746 Jan 18 02:22 yarn.keytab
-r-------- 1 root hadoop 156 Jan 18 02:22 container-executor.cfg
-rw------- 1 root root 3688 Jan 18 02:22 supervisor.conf
But after giving permission, restart creates another foldr with the same permission, how to resolve this problem.
Thanks,
Tulasi
Created 01-23-2019 01:48 PM
Could you start a new thread with your new issue so that we don't mix issues in the same thread. the space character issue is likely to help others, so it would be good to start a new thread for permission denied issue. I think it is a known one, but it will be easier to discuss if we can start fresh.
Created 01-23-2019 11:44 PM
Thanks Ben, will create a new thread.