Created 12-19-2016 12:52 AM
Hello
I have a working CDH 5.9 test cluster. I want to configure level 1 TLS encryption for Cloudera manager using self signed certificate.
I used the official guide, then reverted to a pre-change snapshot and tried again using this guide: https://united.softserveinc.com/blogs/tls-encryption-cloudera-manager, but got the same result.
Cloudera manager UI redirects me to sttps on port 7183 as expected. I can also see that the servers are sending heartbeats.
The problem is that Cloudera management service components don't seem to connect. They seem down to Cloudera manager and if I try to start them I get error.
The culprit seems to be activity monitor.
When I try to start activity monitor I get this error:
Failed to publish event: SimpleEvent{attributes={STACKTRACE=[javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: No trusted certificate found
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1884)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:276)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:270)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1341)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:804)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
at
But I did import the cetrificate into the truststore, gave proper permisions and pointed the configuration to the truststore file, so I can't figure out what's wrong.
If I look at agents log I can also see a "connection refused" errors:
[19/Dec/2016 10:35:34 +0000] 5656 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[19/Dec/2016 10:35:34 +0000] 5656 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-fbe9bdb7c0b8d1671e18298752512c5a
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.9.0-py2.6.egg/cmf/monitor/firehose.py", line 116, in _send
self._port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 742, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
Although from the CM UI I can see that last heartbeat from the servers took place 6-10secs ago.
If I try to perform an openssl connection check I ger: Verify return code: 18 (self signed certificate).
I followed the instructions thoroughly several times but it just doesnt work.
What am I doing wrong ? Are self signed certificates really supported ?
Thank you
Guy
Created 12-20-2016 03:05 AM
Hello
I noticed that in the creation of the first keystore I did not change the CN to the appropriate value, so I had inconsistency between the keystore on the first host and on the other nodes.
I tried the whole process again with the appropriate CN (I had snapshots from before the change) and this time it worked !
Just to be sure it's not an accident I will do the whole thing again.
Thank you very much for your help
Guy
Created 12-19-2016 10:34 AM
Did you import the root certificate into the default system truststore?
Created 12-19-2016 10:58 AM
I belive I did. This is the part when you run:
keytool -import -alias cms -file /tmp/selfsigned.cer -keystore$JAVA_HOME/jre/lib/security/jssecacerts -storepass changeit
Isn't it ?
I did that on all the nodes in the cluster, and it said that the certificate was succesfuly added to the keystore.
Created on 12-19-2016 11:08 AM - edited 12-19-2016 11:12 AM
The first thing that caught my eyes is "-alias" flag. I would rather use the actual hostname for that.
Also, could you run this command as a root:
find /* -iname "cacerts"
and paste the output here?
Created 12-19-2016 01:21 PM
Hello
I changed the alias to be the server name where scm server is running and I still have the same problem. Clousera management service does not start (or not communicating) and the log shows "certificate_unknown" messages.
Here is the result of the find command on the scm server node:
/etc/pki/java/cacerts
/etc/pki/ca-trust/extracted/java/cacerts
/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hue/build/env/lib/python2.6/site-packages/boto-2.42.0-py2.6.egg/boto/cacerts
/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/boto-2.38.0-py2.6.egg/boto/cacerts
/usr/java/jdk1.8.0_111/jre/lib/security/cacerts
/usr/java/jdk1.7.0_67-cloudera/jre/lib/security/cacerts
/usr/java/jdk1.6.0_31/jre/lib/security/cacerts
I was using the one under java 1.7.0_67
Thanks
Guy
Created on 12-19-2016 01:31 PM - edited 12-19-2016 01:32 PM
Please update this one as well:
/etc/pki/java/cacerts
restart cloudera manager service:
service cloudera-scm-server
And paste the whole log here.
Also, you can verify the certificate itself with:
openssl verify certnew.cer
Created 12-20-2016 03:05 AM
Hello
I noticed that in the creation of the first keystore I did not change the CN to the appropriate value, so I had inconsistency between the keystore on the first host and on the other nodes.
I tried the whole process again with the appropriate CN (I had snapshots from before the change) and this time it worked !
Just to be sure it's not an accident I will do the whole thing again.
Thank you very much for your help
Guy