Created on 07-07-2019 08:43 PM - edited 09-16-2022 07:29 AM
How do I enable further debugging on cloudera-scm-agents?
I'm working on deploying the cluster using self signed certificates but I'm running into the below issue and can't get past it:
[07/Jul/2019 23:35:05 +0000] 23766 MainThread agent ERROR Heartbeating to cm-r01nn01.mws.mds.xyz:7182 failed. Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1387, in _send_heartbeat self.cfg.max_cert_depth) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__ self.conn.connect() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 69, in connect sock.connect((self.host, self.port)) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 309, in connect ret = self.connect_ssl() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 295, in connect_ssl return m2.ssl_connect(self.ssl, self._timeout) SSLError: certificate verify failed
What I have in my certificates folder is the following:
[root@cm-r01en01 pki]# pwd /opt/cloudera/security/pki [root@cm-r01en01 pki]# ls -atlri total 16 69943167 -rw-r--r--. 1 root root 2385 Apr 1 23:06 cm-r01en01.mws.mds.xyz.keystore.jks 69943152 -rw-r--r--. 1 root root 1453 Apr 1 23:07 cm-r01en01.mws.mds.xyz.pem 3870062 drwxr-xr-x. 5 root root 37 Apr 1 23:09 .. 69943169 lrwxrwxrwx. 1 root root 62 Apr 1 23:11 server.jks -> /opt/cloudera/security/pki/cm-r01en01.mws.mds.xyz.keystore.jks 69943259 -rw-r--r--. 1 root root 1453 Jul 6 20:01 cm-r01nn01.mws.mds.xyz.pem 69943154 lrwxrwxrwx. 1 root root 53 Jul 6 20:02 rootca.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem 67689060 lrwxrwxrwx. 1 root root 53 Jul 6 20:36 agent.pem -> /opt/cloudera/security/pki/cm-r01en01.mws.mds.xyz.pem 69943151 drwxr-xr-x. 2 root root 4096 Jul 6 20:36 . [root@cm-r01en01 pki]#
I'm not 100% sure if I have everything right though. My cloudera-scm-agent config for that one host:
[root@cm-r01en01 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -v "#" | sed -e "/^$/d" [General] server_host=cm-r01nn01.mws.mds.xyz server_port=7182 max_collection_wait_seconds=10.0 metrics_url_timeout_seconds=30.0 task_metrics_timeout_seconds=5.0 monitored_nodev_filesystem_types=nfs,nfs4,tmpfs local_filesystem_whitelist=ext2,ext3,ext4,xfs impala_profile_bundle_max_bytes=1073741824 stacks_log_bundle_max_bytes=1073741824 stacks_log_max_uncompressed_file_size_bytes=5242880 orphan_process_dir_staleness_threshold=5184000 orphan_process_dir_refresh_interval=3600 scm_debug=DEBUG dns_resolution_collection_interval_seconds=60 dns_resolution_collection_timeout_seconds=30 [Security] use_tls=1 max_cert_depth=9 verify_cert_file=/opt/cloudera/security/pki/agent.pem verify_cert_dir=/opt/cloudera/security/pki/ [Hadoop] [Cloudera] [JDBC] [Cgroup_Paths] [root@cm-r01en01 pki]#
cm-r01nn01 is the Name Node.
cm -r01en01 will be the gateway / entry point to the cluster. It will also run a few services.
This is CM 6.2 . I'm looking to go through the certificate process in preparation for a more formal deployment later on w/ official certificates. Using self signed certs for now for this POC.
In particular, what certificate has it tried to load and is looking for? How do I enable further debug logs to see all the calls it's making and files it's loading?
Cheers,
TK
Created 08-01-2019 05:12 PM
The exception is in the agent and indicates to us that the agent is not able to verify the certificate that was returned by Cloudera Manager during the TLS handshake.
In order to know why, we should look at what host the agent tried to contact (server_host in config.ini) and what certificates were listed in the SAN of the server certificate.
You can use the following command to see what certificate is returned:
openssl s_client -connect $(grep "server_host" /etc/cloudera-scm-agent/config.ini | sed s/server_host=//):7182 </dev/null | openssl x509 -text -noout
Then, check to make sure agent's truststore has the proper certificate that trusts the CM cert. To test, you can use:
openssl s_client -connect $(grep -v '^#' /etc/cloudera-scm-agent/config.ini | grep "server_host=" | sed s/server_host=//):7182 -CAfile $(grep -v '^#' /etc/cloudera-scm-agent/config.ini | grep "verify_cert_file=" |sed s/verify_cert_file=//) -verify_hostname $(grep -v '^#' /etc/cloudera-scm-agent/config.ini | grep "server_host=" | sed s/server_host=//)</dev/null
The above is probably not that elegant, but you should be able to run it as it is. It will grab your hostname and trust store file from the host's config.ini and then connect to your CM host to do a TLS handshake. "-verify_hostname" will tell openssl to also do hostname validation to mimic what the agent does.
The result code of the above command should give us a better idea of why the handshake is failing.
Created on 07-11-2019 08:47 PM - edited 07-11-2019 08:52 PM
The error messages I get from the agent when I attempt (3):
==> /var/log/cloudera-scm-agent/status-stderr.log <== [11/Jul/2019:23:37:40] ENGINE Error in HTTPServer.tick Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1339, in start self.tick() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1408, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/status_server.py", line 1048, in wrap ssl.accept_ssl() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 258, in accept_ssl return m2.ssl_accept(self.ssl, self._timeout) SSLError: unexpected eof
[11/Jul/2019 23:37:43 +0000] 8193 MainThread agent ERROR Heartbeating to cm-r01nn01.mws.mds.xyz:7182 failed. Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1387, in _send_heartbeat self.cfg.max_cert_depth) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__ self.conn.connect() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 69, in connect sock.connect((self.host, self.port)) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 309, in connect ret = self.connect_ssl() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 295, in connect_ssl return m2.ssl_connect(self.ssl, self._timeout) SSLError: sslv3 alert certificate unknown
The pressing question I have above all others is how do I get the agent to print out enough logging to tell me WHICH certificate it attempted to load so I know the context under which the above is thrown? Right now, given the above error, I can't really take action without knowing the exact file the exceptions are referring too, other then deduce based on the change I've done.
My setup as it is now:
[ cm-r01en01 ] ( Utility Server )
[root@cm-r01en01 pki]# ls -altri total 32 3870062 drwxr-xr-x. 5 root root 37 Apr 1 23:09 .. 69943169 lrwxrwxrwx. 1 root root 62 Apr 1 23:11 server.jks -> /opt/cloudera/security/pki/cm-r01en01.mws.mds.xyz.keystore.jks 69943167 -rw-r--r--. 1 cloudera-scm cloudera-scm 3449 Jul 8 03:32 cm-r01en01.mws.mds.xyz.keystore.jks 67257528 -rw-r--r--. 1 cloudera-scm cloudera-scm 2775 Jul 9 23:51 cm-r01en01.mws.mds.xyz.keystore.p12 67586231 -rw-r--r--. 1 cloudera-scm cloudera-scm 1720 Jul 9 23:52 cm-r01en01.mws.mds.xyz.cert.pem 71202518 -rw-r--r--. 1 cloudera-scm cloudera-scm 1863 Jul 9 23:53 cm-r01en01.mws.mds.xyz.key.pem 71305735 -r--r-----. 1 cloudera-scm cloudera-scm 23 Jul 9 23:53 client-agent.pw 67257529 lrwxrwxrwx. 1 root root 30 Jul 9 23:55 client-key.pem -> cm-r01en01.mws.mds.xyz.key.pem 71202519 lrwxrwxrwx. 1 root root 31 Jul 9 23:55 client-cert.pem -> cm-r01en01.mws.mds.xyz.cert.pem 71305736 -rw-r--r--. 1 cloudera-scm cloudera-scm 1432 Jul 10 20:27 cm-r01en01.mws.mds.xyz.pem 71305755 -rw-r--r--. 1 cloudera-scm cloudera-scm 1453 Jul 10 22:14 cm-r01nn01.mws.mds.xyz.pem 69943176 lrwxrwxrwx. 1 root root 53 Jul 10 22:14 agent.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem 69943151 drwxr-xr-x. 2 root root 4096 Jul 11 23:01 . [root@cm-r01en01 pki]# hostname cm-r01en01.mws.mds.xyz [root@cm-r01en01 pki]# keytool -list -keystore /etc/pki/ca-trust/extracted/java/jssecacerts|grep -Ei cm Enter keystore password: changeit acraizfnmt-rcm, Mar 26, 2019, trustedCertEntry, cm-r01nn02.mws.mds.xyz, Apr 14, 2019, trustedCertEntry, cm-r01en01.mws.mds.xyz, Jul 11, 2019, trustedCertEntry, cm-r01nn01.mws.mds.xyz, Apr 14, 2019, trustedCertEntry, [root@cm-r01en01 pki]#
[root@cm-r01en01 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -Eiv "#"|sed -e "/^$/d" [General] server_host=cm-r01nn01.mws.mds.xyz server_port=7182 max_collection_wait_seconds=10.0 metrics_url_timeout_seconds=30.0 task_metrics_timeout_seconds=5.0 monitored_nodev_filesystem_types=nfs,nfs4,tmpfs local_filesystem_whitelist=ext2,ext3,ext4,xfs impala_profile_bundle_max_bytes=1073741824 stacks_log_bundle_max_bytes=1073741824 stacks_log_max_uncompressed_file_size_bytes=5242880 orphan_process_dir_staleness_threshold=5184000 orphan_process_dir_refresh_interval=3600 scm_debug=logging.DEBUG dns_resolution_collection_interval_seconds=60 dns_resolution_collection_timeout_seconds=30 [Security] use_tls=1 max_cert_depth=9 verify_cert_file=/opt/cloudera/security/pki/agent.pem client_key_file=/opt/cloudera/security/pki/client-key.pem client_keypw_file=/opt/cloudera/security/pki/client-agent.pw client_cert_file=/opt/cloudera/security/pki/client-cert.pem [Hadoop] [Cloudera] [JDBC] [Cgroup_Paths] [root@cm-r01en01 pki]#
[ cm-r01en02 ] ( Utility Server )
[root@cm-r01en02 pki]# keytool -list -keystore /etc/pki/ca-trust/extracted/java/jssecacerts|grep -Ei cm Enter keystore password: changeit acraizfnmt-rcm, Mar 26, 2019, trustedCertEntry, cm-r01nn02.mws.mds.xyz, Jul 10, 2019, trustedCertEntry, cm-r01en02.mws.mds.xyz, Jul 10, 2019, trustedCertEntry, cm-r01nn01.mws.mds.xyz, Jul 10, 2019, trustedCertEntry, You have new mail in /var/spool/mail/root [root@cm-r01en02 pki]# ls -altri total 28 135616270 drwxr-xr-x. 5 root root 37 Jul 10 21:28 .. 335605256 -rw-r--r--. 1 cloudera-scm cloudera-scm 2386 Jul 10 21:29 cm-r01en02.mws.mds.xyz.keystore.jks 335605249 -rw-r--r--. 1 cloudera-scm cloudera-scm 1453 Jul 10 21:51 cm-r01en02.mws.mds.xyz.pem 335605265 lrwxrwxrwx. 1 root root 62 Jul 10 21:56 server.jks -> /opt/cloudera/security/pki/cm-r01en02.mws.mds.xyz.keystore.jks 335605275 -rw-r--r--. 1 cloudera-scm cloudera-scm 1453 Jul 10 22:14 cm-r01nn01.mws.mds.xyz.pem 335605382 lrwxrwxrwx. 1 root root 53 Jul 10 22:14 agent.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem 335605420 -rw-r--r--. 1 cloudera-scm cloudera-scm 2775 Jul 11 23:09 cm-r01en02.mws.mds.xyz.keystore.p12 335605426 -rw-r--r--. 1 cloudera-scm cloudera-scm 1720 Jul 11 23:10 cm-r01en02.mws.mds.xyz.cert.pem 335605425 -rw-r--r--. 1 cloudera-scm cloudera-scm 1863 Jul 11 23:11 cm-r01en02.mws.mds.xyz.key.pem 335605429 lrwxrwxrwx. 1 root root 30 Jul 11 23:12 client-key.pem -> cm-r01en02.mws.mds.xyz.key.pem 335605430 lrwxrwxrwx. 1 root root 31 Jul 11 23:12 client-cert.pem -> cm-r01en02.mws.mds.xyz.cert.pem 335860926 drwxr-xr-x. 2 root root 4096 Jul 11 23:12 . [root@cm-r01en02 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -Eiv "#"|sed -e "/^$/d" [General] server_host=cm-r01nn01.mws.mds.xyz server_port=7182 max_collection_wait_seconds=10.0 metrics_url_timeout_seconds=30.0 task_metrics_timeout_seconds=5.0 monitored_nodev_filesystem_types=nfs,nfs4,tmpfs local_filesystem_whitelist=ext2,ext3,ext4,xfs impala_profile_bundle_max_bytes=1073741824 stacks_log_bundle_max_bytes=1073741824 stacks_log_max_uncompressed_file_size_bytes=5242880 orphan_process_dir_staleness_threshold=5184000 orphan_process_dir_refresh_interval=3600 scm_debug=logging.DEBUG dns_resolution_collection_interval_seconds=60 dns_resolution_collection_timeout_seconds=30 [Security] use_tls=1 max_cert_depth=9 verify_cert_file=/opt/cloudera/security/pki/agent.pem client_key_file=/opt/cloudera/security/pki/client-key.pem client_keypw_file=/opt/cloudera/security/pki/agent.pw client_cert_file=/opt/cloudera/security/pki/client-cert.pem [Hadoop] [Cloudera] [JDBC] [Cgroup_Paths] You have new mail in /var/spool/mail/root [root@cm-r01en02 pki]#
[ cm-r01nn01 ] (Name Node)
[root@cm-r01nn01 pki]# keytool -list -keystore /etc/pki/ca-trust/extracted/java/jssecacerts|grep -Ei cm Enter keystore password: changeit acraizfnmt-rcm, Mar 26, 2019, trustedCertEntry, cm-r01nn02.mws.mds.xyz, Apr 14, 2019, trustedCertEntry, cm-c01.mws.mds.xyz, Jul 6, 2019, trustedCertEntry, cm-r01nn01.mws.mds.xyz, Mar 31, 2019, trustedCertEntry, [root@cm-r01nn01 pki]# [root@cm-r01nn01 pki]# [root@cm-r01nn01 pki]# cd /opt/cloudera/security/pki [root@cm-r01nn01 pki]# ls -altri total 32 201424378 drwxr-xr-x. 5 root root 37 Mar 31 23:42 .. 135962833 lrwxrwxrwx. 1 root root 62 Mar 31 23:45 server.jks -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.keystore.jks 135962832 -rw-r--r--. 1 cloudera-scm cloudera-scm 4723 Jul 6 00:00 cm-r01nn01.mws.mds.xyz.keystore.jks 135283830 -rw-r--r--. 1 root root 1435 Jul 6 00:10 cm-c01.mws.mds.xyz.pem 135044185 -rw-r--r--. 1 root root 2751 Jul 8 03:49 cm-r01nn01.mws.mds.xyz.keystore.p12 135044187 -rw-r--r--. 1 root root 1691 Jul 8 03:50 cm-r01nn01.mws.mds.xyz.cert.pem 135044188 -rw-r--r--. 1 root root 1859 Jul 8 03:50 cm-r01nn01.mws.mds.xyz.key.pem 135962830 -rw-r--r--. 1 root root 1453 Jul 10 22:14 cm-r01nn01.mws.mds.xyz.pem 135962831 lrwxrwxrwx. 1 root root 53 Jul 10 22:14 agent.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem 135962829 drwxr-xr-x. 2 root root 4096 Jul 10 22:14 . [root@cm-r01nn01 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -Eiv "#"|sed -e "/^$/d" [General] server_host=cm-r01nn01.mws.mds.xyz server_port=7182 max_collection_wait_seconds=10.0 metrics_url_timeout_seconds=30.0 task_metrics_timeout_seconds=5.0 monitored_nodev_filesystem_types=nfs,nfs4,tmpfs local_filesystem_whitelist=ext2,ext3,ext4,xfs impala_profile_bundle_max_bytes=1073741824 stacks_log_bundle_max_bytes=1073741824 stacks_log_max_uncompressed_file_size_bytes=5242880 orphan_process_dir_staleness_threshold=5184000 orphan_process_dir_refresh_interval=3600 scm_debug=logging.DEBUG dns_resolution_collection_interval_seconds=60 dns_resolution_collection_timeout_seconds=30 [Security] use_tls=1 max_cert_depth=9 verify_cert_file=/opt/cloudera/security/pki/agent.pem [Hadoop] [Cloudera] [JDBC] [Cgroup_Paths] [root@cm-r01nn01 pki]#
NOTE: I haven't reconfigured the /etc/cloudera-scm-agent/config.ini yet on this (NN) node since the two other nodes (EN) aren't working anyway.
Thx,
TK
Created 07-12-2019 11:18 AM
Hi @TCloud ,
The documentation contains information about the trusstore in Step 4:
https://www.cloudera.com/documentation/enterprise/latest/topics/how_to_configure_cm_tls.html#concept...
It does not include *what* to put in the truststore file, though, so yes, that is something that should be improved.
If you are using a Certificate Authority to sign your certificates, you can simply add the root CA certificate to the truststore.
If you are using self-signed certificates, each agent's certificate needs to be imported so CM can validate the agent's certificate.
Created 07-15-2019 09:59 PM
Thanks very much again for taking the time and explain here. I've got this part working as well.
In the meantime, I'm looking to enable high availability on the cluster. Have a few questions in this regard.
1) HAproxy is given as an example. I've used Haproxy + Keepalived for the CMS (7183) and a custom DNS entry, cm-c01.mws.mds.xyz to point to the cluster VIP. Everything works, including the certs for the UI. UI correctly displays the SSL certs for cm-c01 rather then the constituent hosts providing the backend.
2) I tried the same process with the Agent Avro port 7182 (If I'm calling it that correctly). I've set up a VIP, configured HAproxy and the proper SSL certs for srv-c01.mws.mds.xyz. This doesn't work. I've imported the right certs into the jssecerts file as well.
HAproxy config:
frontend cmin bind cm-c01:443 ssl crt /etc/haproxy/certs/cm-c01.mws.mds.xyz-haproxy.pem no-sslv3 default_backend cmback backend cmback mode http balance roundrobin server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7183 ssl check verify none port 7183 inter 12000 rise 3 fall 3 server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7183 ssl check verify none port 7183 inter 12000 rise 3 fall 3 frontend srvin log 127.0.0.1 local0 debug bind srv-c01:17182 ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3 default_backend srvback backend srvback log /dev/log local0 debug mode http balance roundrobin server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3
Each PEM file has a private and public key.
How could I get this working with Self-Signed SSL certs? Do I need to revert the HAproxy config to a tcp pass-through? If so, how will I handle the case when it fails over and needs the cert file of the second host? Do I combine the public pem file certs?
Thx,
TK
Created 07-16-2019 11:30 AM
Created on 07-16-2019 08:18 PM - edited 07-16-2019 08:41 PM
How do I get SSL to work properly with CM and port 7182 through a VIP provided by a Load Balancer?
A visualization:
Created 07-17-2019 10:37 AM
Have you tried it and it failed?
If so, what was the problem.
You configure the agent with a hostname and a port that it will use to send heartbeats to that host and port.
If you have TLS enabled, then the same rules apply:
If the client (agent) is doing validation, then it must be able to trust the signer of the CM certificate and it must be able to validate that the hostname it connected to is included in the certificate (in Subject Alt Name or CN subject).
If you are doing agent authentication to CM, then CM must trust the signer of the certificate presented by the agent.
I don't know if TLS termination at the balancer will work unless the balancer can authenticate. I'd recommend against termination with heartbeats.
Created 07-17-2019 03:42 PM
Tried. Certificates appeared fine ( were recognized ). The issue appears to be between the Load Balancer VIP and HAproxy right now since I get this:
[17/Jul/2019 03:56:22 +0000] 20834 MainThread agent ERROR Heartbeating to srv-c01.mws.mds.xyz:17182 failed. Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat response = self.requestor.request('heartbeat', heartbeat_data) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request return self.issue_request(call_request, message_name, request_datum) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request call_response = self.transceiver.transceive(call_request) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive result = self.read_framed_message() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message framed_message = response_reader.read_framed_message() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message raise ConnectionClosedException("Reader read 0 bytes.") ConnectionClosedException: Reader read 0 bytes.
This is really telling me, and please keep me honest, that the traffic isn't being passed between the VIP and the backend servers defined in the HAproxy config (see below).
Currently trying to solve the above by incorporating what this page mentions, however, I need a newer HAproxy version since mine doesn't support SNI.
My latest HAproxy config is as follows to try and solve the above issue by setting up HAproxy in TLS bridging mode:
frontend srvin log 127.0.0.1 local0 debug bind srv-c01:17182 ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3 default_backend srvback backend srvback log /dev/log local0 debug mode http balance roundrobin cookie srv-c01 insert indirect nocache server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn01.mws.mds.xyz sni req.hdr(host) server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn02.mws.mds.xyz sni req.hdr(host)
Current error is as follows but that's an HAproxy problem now that I'm following up on separately:
[ALERT] 197/040530 (7560) : parsing [/etc/haproxy/haproxy.cfg:69] : 'server cm-r01nn02.mws.mds.xyz' unknown keyword 'sni'.
Question I had above is, is there any other page other than the following that demonstrates the use of a VIP w/ HAproxy and TLS config when load balancing Cloudera services?
Thx,
TK
Created 07-17-2019 06:47 PM
Breakdown of what I'm getting with different configs:
1)
frontend srvin log 127.0.0.1 local0 debug bind srv-c01:17182 ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3 option tcplog default_backend srvback backend srvback log /dev/log local0 debug mode tcp option tcplog balance roundrobin server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7182 check server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7182 check
Results in:
[17/Jul/2019 21:12:12 +0000] 25588 MainThread agent ERROR Heartbeating to srv-c01.mws.mds.xyz:17182 failed. Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat response = self.requestor.request('heartbeat', heartbeat_data) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request return self.issue_request(call_request, message_name, request_datum) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request call_response = self.transceiver.transceive(call_request) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive result = self.read_framed_message() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message framed_message = response_reader.read_framed_message() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message raise ConnectionClosedException("Reader read 0 bytes.") ConnectionClosedException: Reader read 0 bytes.
2)
frontend srvin log 127.0.0.1 local0 debug bind srv-c01:17182 option tcplog default_backend srvback backend srvback log /dev/log local0 debug mode tcp option tcplog balance roundrobin server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7182 check server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7182 check
Results in:
[17/Jul/2019 21:15:23 +0000] 25588 MainThread agent ERROR Heartbeating to srv-c01.mws.mds.xyz:17182 failed. Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1387, in _send_heartbeat self.cfg.max_cert_depth) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__ self.conn.connect() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 69, in connect sock.connect((self.host, self.port)) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 313, in connect if not check(self.get_peer_cert(), self.addr[0]): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Checker.py", line 125, in __call__ fieldName='subjectAltName') WrongHost: Peer certificate subjectAltName does not match host, expected srv-c01.mws.mds.xyz, got DNS:cm-r01nn01.mws.mds.xyz
3)
frontend srvin log 127.0.0.1 local0 debug bind srv-c01:17182 ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3 default_backend srvback backend srvback log /dev/log local0 debug mode http balance roundrobin cookie srv-c01 insert indirect nocache server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn01.mws.mds.xyz sni req.hdr(host) server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn02.mws.mds.xyz sni req.hdr(host)
Results in:
parsing [/etc/haproxy/haproxy.cfg:69] : 'server cm-r01nn02.mws.mds.xyz' unknown keyword 'sni'.
Config uses a pem file with certs from the two nodes + the VIP ( concatenated from srv-c01, cm-r01nn01 and cm-r01nn02 ). The certs appear to work ok. The traffic isn't passing through, however.
Config looks like this:
[root@cm-r01wn08 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -v "#" | sed -e "/^$/d" [General] server_host=srv-c01.mws.mds.xyz server_port=17182 max_collection_wait_seconds=10.0 metrics_url_timeout_seconds=30.0 task_metrics_timeout_seconds=5.0 monitored_nodev_filesystem_types=nfs,nfs4,tmpfs local_filesystem_whitelist=ext2,ext3,ext4,xfs impala_profile_bundle_max_bytes=1073741824 stacks_log_bundle_max_bytes=1073741824 stacks_log_max_uncompressed_file_size_bytes=5242880 orphan_process_dir_staleness_threshold=5184000 orphan_process_dir_refresh_interval=3600 scm_debug=DEBUG dns_resolution_collection_interval_seconds=60 dns_resolution_collection_timeout_seconds=30 [Security] use_tls=1 max_cert_depth=9 verify_cert_file=/opt/cloudera/security/pki/cluster-vip.pem client_key_file=/opt/cloudera/security/pki/client-key.pem client_keypw_file=/opt/cloudera/security/pki/agent.pw client_cert_file=/opt/cloudera/security/pki/client-cert.pem [Hadoop] [Cloudera] [JDBC] [Cgroup_Paths] [root@cm-r01wn08 pki]#
I would like to keep it pointed to the VIP because this guarantees the config will be identical on all hosts and ready for any kind of failover.
Thx,
TK
Created 07-17-2019 10:09 PM
The configuration we want is the one that got us the following:
WrongHost: Peer certificate subjectAltName does not match host, expected srv-c01.mws.mds.xyz, got DNS:cm-r01nn01.mws.mds.xyz
This error means that the Cloudera Manager certificate only contains a SAN or CN subject value of cm-r01nn01.mws.mds.xyz. Since the agent is configured to connect to srv-c01.mws.mds.xyz, it attempts to validate that the certificate is valid for srv-c01.mws.mds.xyz.
This situation is addressed here:
In order to make sure that clients can connect to CM by using both srv-c01.mws.mds.xyz and cm-r01nn01.mws.mds.xyz, we need to create a self-signed certificate that contains both in Subject Alternative Name.
For a self-signed certificate, you could use:
keytool -keystore testkeystore.jks -storepass password -keypass password -alias cm-r01nn01.mws.mds.xyz -genkeypair -keysize 2048 -keyalg RSA -dname "CN=cm-r01nn01.mws.mds.xyz" -ext san=dns:cm-r01nn01.mws.mds.xyz,dns:srv-c01.mws.mds.xyz
If you do recreate the CM certificate like that, you will need to also replace the previous certifiate with this one in any trust store you created since a new key pair was created.
Although it might require a bit more doing, the above should address the error you get when using TLS pass-through in HAProxy. Next, we need to make sure that HAProxy routes requests to your primary CM host every time and only routes to the other host in the event of the primary host's failure. I believe this can be achieved by removing "balance roundrobin" but I'm not sure. I feel like it may make sense to use "backup" directives in the server configuration for nn02 but I'm not sure... seems our example doesn't feel it is necessary.
Created 07-21-2019 10:01 PM
Received the subject error when I replaced the certs with the SAN one that contained 3 hosts:
keytool -genkeypair -alias cm-c01.mws.mds.xyz -keyalg RSA -keysize 2048 -dname "cn=cm-c01.mws.mds.xyz,OU=MDS,O=MDS,L=Los Angeles,ST=California,C=US" -keypass cm-c01.mws.mds.xyz -keystore cm-c01.mws.mds.xyz.keystore.jks -storepass cm-c01.mws.mds.xyz -validity 3650 -ext EKU=serverAuth,clientAuth,codeSigning,emailProtection,timeStamping,OCSPSigning -ext san=dns:cm-c01.mws.mds.xyz,dns:cm-r01nn01.mws.mds.xyz,dns:cm-r01nn02.mws.mds.xyz
Updated jssecerts (in path above) with the new cert as well. Same issue.
I'm running the load balancer and haproxy on the cm-r01nn01/02 . This is a problem since Cloudera opens up ports such as 7180, 7182 and 7183 on all available interfaces. So if I have a VIP running on the same host,
Cloudera services try and do bind to it. Can't really have HAproxy running on port 7183 on the LB VIP if Cloudera services are already bound to the same port.
Tried the Cloudera Manager Hostname Override but only got these errors:
12:25:32.076 AM WARN BasicScmProxy Exception while getting fetch configDefaults hash: none java.io.IOException: HTTPS hostname wrong: should be <srv-c01.mws.mds.xyz> at sun.net.www.protocol.https.HttpsClient.checkURLSpoofing(HttpsClient.java:649) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:573) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:259) at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:277) at com.cloudera.cmf.BasicScmProxy.fetch(BasicScmProxy.java:607) at com.cloudera.cmf.BasicScmProxy.getFragmentAndHash(BasicScmProxy.java:696) at com.cloudera.cmf.DescriptorAndFragments.newDescriptorAndFragments(DescriptorAndFragments.java:65) at com.cloudera.cmon.firehose.Main.main(Main.java:396)
Need to spend more time reading on the LB's and CM / CDH.
Thx,
TK