Support Questions

Find answers, ask questions, and share your expertise

SSLError: certificate verify failed

avatar
Explorer

How do I enable further debugging on cloudera-scm-agents?

 

I'm working on deploying the cluster using self signed certificates but I'm running into the below issue and can't get past it:

 

[07/Jul/2019 23:35:05 +0000] 23766 MainThread agent ERROR Heartbeating to cm-r01nn01.mws.mds.xyz:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1387, in _send_heartbeat
self.cfg.max_cert_depth)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
self.conn.connect()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 69, in connect
sock.connect((self.host, self.port))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 309, in connect
ret = self.connect_ssl()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 295, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
SSLError: certificate verify failed

What I have in my certificates folder is the following:

 

[root@cm-r01en01 pki]# pwd
/opt/cloudera/security/pki
[root@cm-r01en01 pki]# ls -atlri
total 16
69943167 -rw-r--r--. 1 root root 2385 Apr  1 23:06 cm-r01en01.mws.mds.xyz.keystore.jks
69943152 -rw-r--r--. 1 root root 1453 Apr  1 23:07 cm-r01en01.mws.mds.xyz.pem
 3870062 drwxr-xr-x. 5 root root   37 Apr  1 23:09 ..
69943169 lrwxrwxrwx. 1 root root   62 Apr  1 23:11 server.jks -> /opt/cloudera/security/pki/cm-r01en01.mws.mds.xyz.keystore.jks
69943259 -rw-r--r--. 1 root root 1453 Jul  6 20:01 cm-r01nn01.mws.mds.xyz.pem
69943154 lrwxrwxrwx. 1 root root   53 Jul  6 20:02 rootca.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem
67689060 lrwxrwxrwx. 1 root root   53 Jul  6 20:36 agent.pem -> /opt/cloudera/security/pki/cm-r01en01.mws.mds.xyz.pem
69943151 drwxr-xr-x. 2 root root 4096 Jul  6 20:36 .
[root@cm-r01en01 pki]#

I'm not 100% sure if I have everything right though.  My cloudera-scm-agent config for that one host:

 

[root@cm-r01en01 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -v "#" | sed -e "/^$/d"
[General]
server_host=cm-r01nn01.mws.mds.xyz
server_port=7182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=DEBUG
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
verify_cert_file=/opt/cloudera/security/pki/agent.pem
verify_cert_dir=/opt/cloudera/security/pki/
[Hadoop]
[Cloudera]
[JDBC]
[Cgroup_Paths]
[root@cm-r01en01 pki]#

cm-r01nn01 is the Name Node.

cm -r01en01 will be the gateway  / entry point to the cluster.  It will also run a few services.  

 

This is CM 6.2 .  I'm looking to go through the certificate process in preparation for a more formal deployment later on w/ official certificates.  Using self signed certs for now for this POC.

 

In particular, what certificate has it tried to load and is looking for?  How do I enable further debug logs to see all the calls it's making and files it's loading?

 

Cheers,
TK

1 ACCEPTED SOLUTION

avatar
Master Guru

@TCloud,

The exception is in the agent and indicates to us that the agent is not able to verify the certificate that was returned by Cloudera Manager during the TLS handshake.

In order to know why, we should look at what host the agent tried to contact (server_host in config.ini) and what certificates were listed in the SAN of the server certificate.

You can use the following command to see what certificate is returned:

openssl s_client -connect $(grep "server_host" /etc/cloudera-scm-agent/config.ini | sed s/server_host=//):7182 </dev/null | openssl x509 -text -noout

Then, check to make sure agent's truststore has the proper certificate that trusts the CM cert.  To test, you can use:

openssl s_client -connect $(grep -v '^#' /etc/cloudera-scm-agent/config.ini | grep "server_host=" | sed s/server_host=//):7182 -CAfile $(grep -v '^#' /etc/cloudera-scm-agent/config.ini | grep "verify_cert_file=" |sed s/verify_cert_file=//) -verify_hostname $(grep -v '^#' /etc/cloudera-scm-agent/config.ini | grep "server_host=" | sed s/server_host=//)</dev/null

 

The above is probably not that elegant, but you should be able to run it as it is.  It will grab your hostname and trust store file from the host's config.ini and then connect to your CM host to do a TLS handshake.  "-verify_hostname" will tell openssl to also do hostname validation to mimic what the agent does.

 

The result code of the above command should give us a better idea of why the handshake is failing.

 

View solution in original post

23 REPLIES 23

avatar
Explorer

The error messages I get from the agent when I attempt (3):

 

==> /var/log/cloudera-scm-agent/status-stderr.log <==
[11/Jul/2019:23:37:40] ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1339, in start
self.tick()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1408, in tick
s, ssl_env = self.ssl_adapter.wrap(s)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/status_server.py", line 1048, in wrap
ssl.accept_ssl()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 258, in accept_ssl
return m2.ssl_accept(self.ssl, self._timeout)
SSLError: unexpected eof

 

[11/Jul/2019 23:37:43 +0000] 8193 MainThread agent        ERROR    Heartbeating to cm-r01nn01.mws.mds.xyz:7182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1387, in _send_heartbeat
    self.cfg.max_cert_depth)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
    self.conn.connect()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 69, in connect
    sock.connect((self.host, self.port))
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 309, in connect
    ret = self.connect_ssl()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 295, in connect_ssl
    return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert certificate unknown

 

The pressing question I have above all others is how do I get the agent to print out enough logging to tell me WHICH certificate it attempted to load so I know the context under which the above is thrown?  Right now, given the above error, I can't really take action without knowing the exact file the exceptions are referring too, other then deduce based on the change I've done. 

 

 

 

My setup as it is now:

 

[ cm-r01en01 ]  ( Utility Server )

 

[root@cm-r01en01 pki]# ls -altri
total 32
 3870062 drwxr-xr-x. 5 root         root           37 Apr  1 23:09 ..
69943169 lrwxrwxrwx. 1 root         root           62 Apr  1 23:11 server.jks -> /opt/cloudera/security/pki/cm-r01en01.mws.mds.xyz.keystore.jks
69943167 -rw-r--r--. 1 cloudera-scm cloudera-scm 3449 Jul  8 03:32 cm-r01en01.mws.mds.xyz.keystore.jks
67257528 -rw-r--r--. 1 cloudera-scm cloudera-scm 2775 Jul  9 23:51 cm-r01en01.mws.mds.xyz.keystore.p12
67586231 -rw-r--r--. 1 cloudera-scm cloudera-scm 1720 Jul  9 23:52 cm-r01en01.mws.mds.xyz.cert.pem
71202518 -rw-r--r--. 1 cloudera-scm cloudera-scm 1863 Jul  9 23:53 cm-r01en01.mws.mds.xyz.key.pem
71305735 -r--r-----. 1 cloudera-scm cloudera-scm   23 Jul  9 23:53 client-agent.pw
67257529 lrwxrwxrwx. 1 root         root           30 Jul  9 23:55 client-key.pem -> cm-r01en01.mws.mds.xyz.key.pem
71202519 lrwxrwxrwx. 1 root         root           31 Jul  9 23:55 client-cert.pem -> cm-r01en01.mws.mds.xyz.cert.pem
71305736 -rw-r--r--. 1 cloudera-scm cloudera-scm 1432 Jul 10 20:27 cm-r01en01.mws.mds.xyz.pem
71305755 -rw-r--r--. 1 cloudera-scm cloudera-scm 1453 Jul 10 22:14 cm-r01nn01.mws.mds.xyz.pem
69943176 lrwxrwxrwx. 1 root         root           53 Jul 10 22:14 agent.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem
69943151 drwxr-xr-x. 2 root         root         4096 Jul 11 23:01 .
[root@cm-r01en01 pki]# hostname
cm-r01en01.mws.mds.xyz
[root@cm-r01en01 pki]# keytool -list -keystore /etc/pki/ca-trust/extracted/java/jssecacerts|grep -Ei cm
Enter keystore password:  changeit
acraizfnmt-rcm, Mar 26, 2019, trustedCertEntry,
cm-r01nn02.mws.mds.xyz, Apr 14, 2019, trustedCertEntry,
cm-r01en01.mws.mds.xyz, Jul 11, 2019, trustedCertEntry,
cm-r01nn01.mws.mds.xyz, Apr 14, 2019, trustedCertEntry,
[root@cm-r01en01 pki]#
[root@cm-r01en01 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -Eiv "#"|sed -e "/^$/d"
[General]
server_host=cm-r01nn01.mws.mds.xyz
server_port=7182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=logging.DEBUG
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
verify_cert_file=/opt/cloudera/security/pki/agent.pem
client_key_file=/opt/cloudera/security/pki/client-key.pem
client_keypw_file=/opt/cloudera/security/pki/client-agent.pw
client_cert_file=/opt/cloudera/security/pki/client-cert.pem
[Hadoop]
[Cloudera]
[JDBC]
[Cgroup_Paths]
[root@cm-r01en01 pki]#

[ cm-r01en02 ]  ( Utility Server )

 

[root@cm-r01en02 pki]# keytool -list -keystore /etc/pki/ca-trust/extracted/java/jssecacerts|grep -Ei cm
Enter keystore password:  changeit
acraizfnmt-rcm, Mar 26, 2019, trustedCertEntry,
cm-r01nn02.mws.mds.xyz, Jul 10, 2019, trustedCertEntry,
cm-r01en02.mws.mds.xyz, Jul 10, 2019, trustedCertEntry,
cm-r01nn01.mws.mds.xyz, Jul 10, 2019, trustedCertEntry,
You have new mail in /var/spool/mail/root
[root@cm-r01en02 pki]# ls -altri
total 28
135616270 drwxr-xr-x. 5 root         root           37 Jul 10 21:28 ..
335605256 -rw-r--r--. 1 cloudera-scm cloudera-scm 2386 Jul 10 21:29 cm-r01en02.mws.mds.xyz.keystore.jks
335605249 -rw-r--r--. 1 cloudera-scm cloudera-scm 1453 Jul 10 21:51 cm-r01en02.mws.mds.xyz.pem
335605265 lrwxrwxrwx. 1 root         root           62 Jul 10 21:56 server.jks -> /opt/cloudera/security/pki/cm-r01en02.mws.mds.xyz.keystore.jks
335605275 -rw-r--r--. 1 cloudera-scm cloudera-scm 1453 Jul 10 22:14 cm-r01nn01.mws.mds.xyz.pem
335605382 lrwxrwxrwx. 1 root         root           53 Jul 10 22:14 agent.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem
335605420 -rw-r--r--. 1 cloudera-scm cloudera-scm 2775 Jul 11 23:09 cm-r01en02.mws.mds.xyz.keystore.p12
335605426 -rw-r--r--. 1 cloudera-scm cloudera-scm 1720 Jul 11 23:10 cm-r01en02.mws.mds.xyz.cert.pem
335605425 -rw-r--r--. 1 cloudera-scm cloudera-scm 1863 Jul 11 23:11 cm-r01en02.mws.mds.xyz.key.pem
335605429 lrwxrwxrwx. 1 root         root           30 Jul 11 23:12 client-key.pem -> cm-r01en02.mws.mds.xyz.key.pem
335605430 lrwxrwxrwx. 1 root         root           31 Jul 11 23:12 client-cert.pem -> cm-r01en02.mws.mds.xyz.cert.pem
335860926 drwxr-xr-x. 2 root         root         4096 Jul 11 23:12 .
[root@cm-r01en02 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -Eiv "#"|sed -e "/^$/d"
[General]
server_host=cm-r01nn01.mws.mds.xyz
server_port=7182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=logging.DEBUG
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
verify_cert_file=/opt/cloudera/security/pki/agent.pem
client_key_file=/opt/cloudera/security/pki/client-key.pem
client_keypw_file=/opt/cloudera/security/pki/agent.pw
client_cert_file=/opt/cloudera/security/pki/client-cert.pem
[Hadoop]
[Cloudera]
[JDBC]
[Cgroup_Paths]
You have new mail in /var/spool/mail/root
[root@cm-r01en02 pki]#

[ cm-r01nn01 ]  (Name Node)

 

[root@cm-r01nn01 pki]# keytool -list -keystore /etc/pki/ca-trust/extracted/java/jssecacerts|grep -Ei cm                       Enter keystore password:  changeit
acraizfnmt-rcm, Mar 26, 2019, trustedCertEntry,
cm-r01nn02.mws.mds.xyz, Apr 14, 2019, trustedCertEntry,
cm-c01.mws.mds.xyz, Jul 6, 2019, trustedCertEntry,
cm-r01nn01.mws.mds.xyz, Mar 31, 2019, trustedCertEntry,
[root@cm-r01nn01 pki]#
[root@cm-r01nn01 pki]#
[root@cm-r01nn01 pki]# cd /opt/cloudera/security/pki
[root@cm-r01nn01 pki]# ls -altri
total 32
201424378 drwxr-xr-x. 5 root         root           37 Mar 31 23:42 ..
135962833 lrwxrwxrwx. 1 root         root           62 Mar 31 23:45 server.jks -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.keystore.jks
135962832 -rw-r--r--. 1 cloudera-scm cloudera-scm 4723 Jul  6 00:00 cm-r01nn01.mws.mds.xyz.keystore.jks
135283830 -rw-r--r--. 1 root         root         1435 Jul  6 00:10 cm-c01.mws.mds.xyz.pem
135044185 -rw-r--r--. 1 root         root         2751 Jul  8 03:49 cm-r01nn01.mws.mds.xyz.keystore.p12
135044187 -rw-r--r--. 1 root         root         1691 Jul  8 03:50 cm-r01nn01.mws.mds.xyz.cert.pem
135044188 -rw-r--r--. 1 root         root         1859 Jul  8 03:50 cm-r01nn01.mws.mds.xyz.key.pem
135962830 -rw-r--r--. 1 root         root         1453 Jul 10 22:14 cm-r01nn01.mws.mds.xyz.pem
135962831 lrwxrwxrwx. 1 root         root           53 Jul 10 22:14 agent.pem -> /opt/cloudera/security/pki/cm-r01nn01.mws.mds.xyz.pem
135962829 drwxr-xr-x. 2 root         root         4096 Jul 10 22:14 .
[root@cm-r01nn01 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -Eiv "#"|sed -e "/^$/d"
[General]
server_host=cm-r01nn01.mws.mds.xyz
server_port=7182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=logging.DEBUG
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
verify_cert_file=/opt/cloudera/security/pki/agent.pem
[Hadoop]
[Cloudera]
[JDBC]
[Cgroup_Paths]
[root@cm-r01nn01 pki]#

NOTE: I haven't reconfigured the /etc/cloudera-scm-agent/config.ini yet on this (NN) node since the two other nodes (EN) aren't working anyway.  

 

Thx,
TK

avatar
Master Guru

Hi @TCloud ,

 

The documentation contains information about the trusstore in Step 4:

https://www.cloudera.com/documentation/enterprise/latest/topics/how_to_configure_cm_tls.html#concept...

 

It does not include *what* to put in the truststore file, though, so yes, that is something that should be improved.

 

If you are using a Certificate Authority to sign your certificates, you can simply add the root CA certificate to the truststore.

If you are using self-signed certificates, each agent's certificate needs to be imported so CM can validate the agent's certificate.

 

 

avatar
Explorer

Thanks very much again for taking the time and explain here.  I've got this part working as well.  

 

In the meantime, I'm looking to enable high availability on the cluster.  Have a few questions in this regard.

 

1) HAproxy is given as an example.  I've used Haproxy + Keepalived for the CMS (7183) and a custom DNS entry, cm-c01.mws.mds.xyz to point to the cluster VIP.  Everything works, including the certs for the UI.  UI correctly displays the SSL certs for cm-c01 rather then the constituent hosts providing the backend.

 

2) I tried the same process with the Agent Avro port 7182 (If I'm calling it that correctly).  I've set up a VIP, configured HAproxy and the proper SSL certs for srv-c01.mws.mds.xyz.  This doesn't work.  I've imported the right certs into the jssecerts file as well. 

 

HAproxy config:

 

frontend cmin
        bind    cm-c01:443 ssl crt /etc/haproxy/certs/cm-c01.mws.mds.xyz-haproxy.pem no-sslv3
        default_backend cmback

backend cmback
        mode http
        balance roundrobin

        server cm-r01nn01.mws.mds.xyz cm-r01nn01.mws.mds.xyz:7183        ssl check verify none port 7183 inter 12000 rise 3 fall 3
        server cm-r01nn02.mws.mds.xyz cm-r01nn02.mws.mds.xyz:7183        ssl check verify none port 7183 inter 12000 rise 3 fall 3

frontend srvin
        log                         127.0.0.1           local0          debug
        bind                        srv-c01:17182       ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3
        default_backend             srvback


backend srvback
        log /dev/log local0 debug
        mode http
        balance roundrobin

        server      cm-r01nn01.mws.mds.xyz      cm-r01nn01.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3
        server      cm-r01nn02.mws.mds.xyz      cm-r01nn02.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3

Each PEM file has a private and public key.  

 

How could I get this working with Self-Signed SSL certs?   Do I need to revert the HAproxy config to a tcp pass-through?  If so, how will I handle the case when it fails over and needs the cert file of the second host?  Do I combine the public pem file certs?

 

Thx,
TK

avatar
Master Guru

@TCloud,

 

Can you clarify the problem you are seeing regarding port 7182?

avatar
Explorer

How do I get SSL to work properly with CM and port 7182 through a VIP provided by a Load Balancer?

 

A visualization:

 

https://ibb.co/hY0GsVY

avatar
Master Guru

@TCloud,

 

Have you tried it and it failed?

If so, what was the problem.

 

You configure the agent with a hostname and a port that it will use to send heartbeats to that host and port.

If you have TLS enabled, then the same rules apply:

 

If the client (agent) is doing validation, then it must be able to trust the signer of the CM certificate and it must be able to validate that the hostname it connected to is included in the certificate (in Subject Alt Name or CN subject).

 

If you are doing agent authentication to CM, then CM must trust the signer of the certificate presented by the agent.

 

I don't know if TLS termination at the balancer will work unless the balancer can authenticate.  I'd recommend against termination with heartbeats.

avatar
Explorer

Tried.  Certificates appeared fine ( were recognized ).  The issue appears to be between the Load Balancer VIP and HAproxy right now since I get this:

 

 

[17/Jul/2019 03:56:22 +0000] 20834 MainThread agent        ERROR    Heartbeating to srv-c01.mws.mds.xyz:17182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat
    response = self.requestor.request('heartbeat', heartbeat_data)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
    return self.issue_request(call_request, message_name, request_datum)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
    call_response = self.transceiver.transceive(call_request)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
    result = self.read_framed_message()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
    framed_message = response_reader.read_framed_message()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
    raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.

 

 

This is really telling me, and please keep me honest,  that the traffic isn't being passed between the VIP and the backend servers defined in the HAproxy config (see below). 

 

Currently trying to solve the above by incorporating what this page mentions, however, I need a newer HAproxy version since mine doesn't support SNI. 

 

My latest HAproxy config is as follows to try and solve the above issue by setting up HAproxy in TLS bridging mode:

 

frontend srvin
        log                         127.0.0.1           local0          debug
        bind                        srv-c01:17182       ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3
        default_backend             srvback


backend srvback
        log /dev/log local0 debug
        mode http
        balance roundrobin
        cookie srv-c01 insert indirect nocache

        server      cm-r01nn01.mws.mds.xyz      cm-r01nn01.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn01.mws.mds.xyz sni req.hdr(host)
        server      cm-r01nn02.mws.mds.xyz      cm-r01nn02.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn02.mws.mds.xyz sni req.hdr(host)

Current error is as follows but that's an HAproxy problem now that I'm following up on separately:

 

[ALERT] 197/040530 (7560) : parsing [/etc/haproxy/haproxy.cfg:69] : 'server cm-r01nn02.mws.mds.xyz' unknown keyword 'sni'.

 

Question I had above is, is there any other page other than the following that demonstrates the use of a VIP w/ HAproxy and TLS config when load balancing Cloudera services?

 

Thx,
TK

avatar
Explorer

Breakdown of what I'm getting with different configs:

 

1) 

 

frontend srvin
        log                         127.0.0.1           local0          debug
        bind                        srv-c01:17182       ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3
        option tcplog
        default_backend             srvback


backend srvback
        log /dev/log local0 debug
        mode tcp
        option tcplog
        balance roundrobin

        server      cm-r01nn01.mws.mds.xyz      cm-r01nn01.mws.mds.xyz:7182 check
        server      cm-r01nn02.mws.mds.xyz      cm-r01nn02.mws.mds.xyz:7182 check

Results in:

 

[17/Jul/2019 21:12:12 +0000] 25588 MainThread agent        ERROR    Heartbeating to srv-c01.mws.mds.xyz:17182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat
    response = self.requestor.request('heartbeat', heartbeat_data)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
    return self.issue_request(call_request, message_name, request_datum)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
    call_response = self.transceiver.transceive(call_request)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
    result = self.read_framed_message()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
    framed_message = response_reader.read_framed_message()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
    raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.

2) 

 

frontend srvin
        log                         127.0.0.1           local0          debug
        bind                        srv-c01:17182
        option tcplog
        default_backend             srvback


backend srvback
        log /dev/log local0 debug
        mode tcp
        option tcplog
        balance roundrobin

        server      cm-r01nn01.mws.mds.xyz      cm-r01nn01.mws.mds.xyz:7182 check
        server      cm-r01nn02.mws.mds.xyz      cm-r01nn02.mws.mds.xyz:7182 check

Results in:

 

[17/Jul/2019 21:15:23 +0000] 25588 MainThread agent        ERROR    Heartbeating to srv-c01.mws.mds.xyz:17182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1387, in _send_heartbeat
    self.cfg.max_cert_depth)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
    self.conn.connect()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 69, in connect
    sock.connect((self.host, self.port))
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 313, in connect
    if not check(self.get_peer_cert(), self.addr[0]):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Checker.py", line 125, in __call__
    fieldName='subjectAltName')
WrongHost: Peer certificate subjectAltName does not match host, expected srv-c01.mws.mds.xyz, got DNS:cm-r01nn01.mws.mds.xyz

3)

frontend srvin
        log                         127.0.0.1           local0          debug
        bind                        srv-c01:17182       ssl crt /etc/haproxy/certs/srv-c01.mws.mds.xyz-haproxy.pem no-sslv3
        default_backend             srvback


backend srvback
        log /dev/log local0 debug
        mode http
        balance roundrobin
        cookie srv-c01 insert indirect nocache

        server      cm-r01nn01.mws.mds.xyz      cm-r01nn01.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn01.mws.mds.xyz sni req.hdr(host)
        server      cm-r01nn02.mws.mds.xyz      cm-r01nn02.mws.mds.xyz:7182 ssl check verify none port 7182 inter 12000 rise 3 fall 3 cookie cm-r01nn02.mws.mds.xyz sni req.hdr(host)

Results in:

parsing [/etc/haproxy/haproxy.cfg:69] : 'server cm-r01nn02.mws.mds.xyz' unknown keyword 'sni'.

 

Config uses a pem file with certs from the two nodes + the VIP ( concatenated from srv-c01, cm-r01nn01 and cm-r01nn02 ).  The certs appear to work ok.  The traffic isn't passing through, however.  

 

Config looks like this:

 

[root@cm-r01wn08 pki]# cat /etc/cloudera-scm-agent/config.ini|grep -v "#" | sed -e "/^$/d"
[General]
server_host=srv-c01.mws.mds.xyz
server_port=17182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=DEBUG
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
verify_cert_file=/opt/cloudera/security/pki/cluster-vip.pem
client_key_file=/opt/cloudera/security/pki/client-key.pem
client_keypw_file=/opt/cloudera/security/pki/agent.pw
client_cert_file=/opt/cloudera/security/pki/client-cert.pem
[Hadoop]
[Cloudera]
[JDBC]
[Cgroup_Paths]
[root@cm-r01wn08 pki]#

I would like to keep it pointed to the VIP because this guarantees the config will be identical on all hosts and ready for any kind of failover.  

 

Thx,
TK

avatar
Master Guru

@TCloud,

 

The configuration we want is the one that got us the following:

 

WrongHost: Peer certificate subjectAltName does not match host, expected srv-c01.mws.mds.xyz, got DNS:cm-r01nn01.mws.mds.xyz

This error means that the Cloudera Manager certificate only contains a SAN or CN subject value of cm-r01nn01.mws.mds.xyz. Since the agent is configured to connect to srv-c01.mws.mds.xyz, it attempts to validate that the certificate is valid for srv-c01.mws.mds.xyz.

 

This situation is addressed here:

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_cm_ha_tls.html#cloudera-manage...

 

In order to make sure that clients can connect to CM by using both srv-c01.mws.mds.xyz and cm-r01nn01.mws.mds.xyz, we need to create a self-signed certificate that contains both in Subject Alternative Name.

 

For a self-signed certificate, you could use:

 

keytool -keystore testkeystore.jks -storepass password -keypass password -alias cm-r01nn01.mws.mds.xyz -genkeypair -keysize 2048 -keyalg RSA -dname "CN=cm-r01nn01.mws.mds.xyz" -ext san=dns:cm-r01nn01.mws.mds.xyz,dns:srv-c01.mws.mds.xyz

 

If you do recreate the CM certificate like that, you will need to also replace the previous certifiate with this one in any trust store you created since a new key pair was created.

 

Although it might require a bit more doing, the above should address the error you get when using TLS pass-through in HAProxy.  Next, we need to make sure that HAProxy routes requests to your primary CM host every time and only routes to the other host in the event of the primary host's failure.  I believe this can be achieved by removing "balance roundrobin" but I'm not sure.  I feel like it may make sense to use "backup" directives in the server configuration for nn02 but I'm not sure... seems our example doesn't feel it is necessary.

avatar
Explorer

Received the subject error when I replaced the certs with the SAN one that contained 3 hosts:

 

keytool -genkeypair -alias cm-c01.mws.mds.xyz -keyalg RSA -keysize 2048 -dname "cn=cm-c01.mws.mds.xyz,OU=MDS,O=MDS,L=Los Angeles,ST=California,C=US" -keypass cm-c01.mws.mds.xyz -keystore cm-c01.mws.mds.xyz.keystore.jks -storepass cm-c01.mws.mds.xyz -validity 3650 -ext EKU=serverAuth,clientAuth,codeSigning,emailProtection,timeStamping,OCSPSigning -ext san=dns:cm-c01.mws.mds.xyz,dns:cm-r01nn01.mws.mds.xyz,dns:cm-r01nn02.mws.mds.xyz

Updated jssecerts (in path above) with the new cert as well.  Same issue.  

 

I'm running the load balancer and haproxy on  the cm-r01nn01/02 .  This is a problem since Cloudera opens up ports such as 7180, 7182 and 7183 on all available interfaces.  So if I have a VIP running on the same host, 
Cloudera services try and do bind to it.  Can't really have HAproxy running on port 7183 on the LB VIP if Cloudera services are already bound to the same port.  

 

Tried the Cloudera Manager Hostname Override but only got these errors:

 

 

12:25:32.076 AM	WARN	BasicScmProxy	
Exception while getting fetch configDefaults hash: none
java.io.IOException: HTTPS hostname wrong:  should be <srv-c01.mws.mds.xyz>
	at sun.net.www.protocol.https.HttpsClient.checkURLSpoofing(HttpsClient.java:649)
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:573)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:259)
	at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:277)
	at com.cloudera.cmf.BasicScmProxy.fetch(BasicScmProxy.java:607)
	at com.cloudera.cmf.BasicScmProxy.getFragmentAndHash(BasicScmProxy.java:696)
	at com.cloudera.cmf.DescriptorAndFragments.newDescriptorAndFragments(DescriptorAndFragments.java:65)
	at com.cloudera.cmon.firehose.Main.main(Main.java:396)

 

Need to spend more time reading on the LB's and CM / CDH.

 

Thx,
TK