Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

TLS configuration broke everything

TLS configuration broke everything

Hi All,

 

I followed the instructions https://www.cloudera.com/documentation/enterprise/latest/topics/how_to_configure_cm_tls.html

to configure my cluster with TLS. 

Now pretty much everything is broken and nobody can communicate with anybody.

 

I am using my own  root certificates to sign other certificates.

The only thing I did differently from what is described on the page is: I do not have intermediate certificates. So wherever it was asked to append intermediate certificate to some file, I appended root certificate instead. Is that right or I should not have appended anything and completely skip the step?

 

There are so many steps that it is very easy to make a mistake somewhere. Is there a good way to test pairwise TLS communication between various Hadoop nodes to figure out where the problem is?

 

Why a symbolic link agent.jks is created? It does not seem to be used anywhere in the configuration either in config.ini or the web gui.

 

Before I enabled TLS, everything worked fine. I also previously managed to configure TLS with self-signed certificates as far as those allow (but not to the very end of the procedure).

 

Here are some errors in the log messages:

=========

[28/Feb/2017 15:22:09 +0000] 30180 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.06 min:0.05 mean:0.06 max:0.07 LIFE_MAX:0.30
[28/Feb/2017 15:23:54 +0000] 30180 MainThread agent        ERROR    Heartbeating to md01.rcc.local:7182 failed.
Traceback (most recent call last):
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/agent.py", line 1363, in _send_heartbeat
   response = self.requestor.request('heartbeat', dict(request=heartbeat))
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request
   return self.issue_request(call_request, message_name, request_datum)
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request
   call_response = self.transceiver.transceive(call_request)
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 483, in transceive
   result = self.read_framed_message()
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 487, in read_framed_message
   response = self.conn.getresponse()
 File "/usr/lib64/python2.7/httplib.py", line 1051, in getresponse
   response.begin()
 File "/usr/lib64/python2.7/httplib.py", line 415, in begin
   version, status, reason = self._read_status()
 File "/usr/lib64/python2.7/httplib.py", line 379, in _read_status
   raise BadStatusLine(line)
BadStatusLine: ''
[28/Feb/2017 15:24:09 +0000] 30180 MainThread agent        ERROR    Heartbeating to md01.rcc.local:7182 failed.
Traceback (most recent call last):
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/agent.py", line 1355, in _send_heartbeat
   self.max_cert_depth)
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/https.py", line 132, in __init__
   self.conn.connect()
 File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/M2Crypto-0.24.0-py2.7-linux-x86_64.egg/M2Crypto/httpslib.py", line 74, in connect
   raise error
error: [Errno 111] Connection refused
[28/Feb/2017 15:24:12 +0000] 30180 MainThread agent        INFO     Stopping agent...
[28/Feb/2017 15:24:12 +0000] 30180 MainThread agent        INFO     No extant cgroups; unmounting any cgroup roots

=========

Any suggestions?

Thank you,

Igor

 

12 REPLIES 12

Re: TLS configuration broke everything

I wonder if permissions and ownership of various security-related files are correct. Here is what I have on CM host that runs both a server and an agent:

=================================

[root@md01 ~]# ls -ltr /opt/cloudera/security/pki/
total 64
-rw-r--r-- 1 cloudera-scm cloudera-scm 1195 Feb 24 20:27 md01.rcc.local-server.csr
-rw-r--r-- 1 cloudera-scm cloudera-scm 2175 Feb 24 20:44 rootca.cert.pem
-rw-r--r-- 1 cloudera-scm cloudera-scm 8232 Feb 27 14:07 md01.rcc.local-server.pem
-rw-r--r-- 1 cloudera-scm cloudera-scm 4264 Feb 27 14:31 md01.rcc.local-server.jks
-rw-r--r-- 1 cloudera-scm cloudera-scm 1241 Feb 27 20:17 md01.rcc.local-agent.csr
-rw-r--r-- 1 cloudera-scm cloudera-scm 8394 Feb 27 20:55 md01.rcc.local-agent.pem
-rw-r--r-- 1 cloudera-scm cloudera-scm 4293 Feb 27 20:56 md01.rcc.local-agent.jks
lrwxrwxrwx 1 cloudera-scm cloudera-scm   51 Feb 27 20:57 agent.cert.pem -> /opt/cloudera/security/pki/md01.rcc.local-agent.pem
lrwxrwxrwx 1 cloudera-scm cloudera-scm   51 Feb 27 20:57 agent.jks -> /opt/cloudera/security/pki/md01.rcc.local-agent.jks
-rw-r--r-- 1 cloudera-scm cloudera-scm 5008 Feb 28 14:42 md01.rcc.local-agent.p12
-rw-r--r-- 1 cloudera-scm cloudera-scm 1991 Feb 28 14:43 md01.rcc.local-agent.key
lrwxrwxrwx 1 cloudera-scm cloudera-scm   51 Feb 28 14:44 agent.key -> /opt/cloudera/security/pki/md01.rcc.local-agent.key
=================================

and here is what I have on a data node with an agent only:

=================================

[root@md01 ~]# ssh md02 ls -ltr /opt/cloudera/security/pki/  
total 28
-rw-r--r-- 1 cloudera-scm cloudera-scm 2175 Feb 27 20:26 rootca.cert.pem
-rw-r--r-- 1 cloudera-scm cloudera-scm 1241 Feb 27 21:43 md02.rcc.local-agent.csr
-rw-r--r-- 1 cloudera-scm cloudera-scm 8394 Feb 28 11:17 md02.rcc.local-agent.pem
lrwxrwxrwx 1 cloudera-scm cloudera-scm   51 Feb 28 11:17 agent.cert.pem -> /opt/cloudera/security/pki/md02.rcc.local-agent.pem
lrwxrwxrwx 1 cloudera-scm cloudera-scm   51 Feb 28 11:17 agent.jks -> /opt/cloudera/security/pki/md02.rcc.local-agent.jks
-rw-r--r-- 1 cloudera-scm cloudera-scm 4293 Feb 28 11:17 md02.rcc.local-agent.jks

=================================

 

Other configuration files:

=================================

[root@md01 ~]# ls -l /etc/cloudera-scm-agent/
total 28
-r--r----- 1 root root   14 Feb 28 14:46 agentkey.pw
-rw-r--r-- 1 root root 9125 Feb 28 14:51 config.ini
-rw-r--r-- 1 root root 9019 Feb 28 14:23 config.ini~
[root@md01 ~]# ls -l /etc/cloudera-scm-server/
total 28
-rw------- 1 cloudera-scm cloudera-scm  433 Feb 10 11:45 db.properties
-rw------- 1 cloudera-scm cloudera-scm  714 Jan 20 13:04 db.properties.~1~
-rw------- 1 cloudera-scm cloudera-scm  424 Feb 10 11:33 db.properties.~2~
-rw------- 1 cloudera-scm cloudera-scm  424 Feb 10 11:40 db.properties.~3~
-rw------- 1 cloudera-scm cloudera-scm  433 Feb 10 11:42 db.properties.~4~
-rw------- 1 cloudera-scm cloudera-scm  714 Jan 20 13:04 db.properties.rpmnew
-rw-r--r-- 1 root         root         2082 Jan 20 13:04 log4j.properties

 

 

[root@md01 ~]# ls -l  $JAVA_HOME/jre/lib/security/jssecacerts
-rw-r--r-- 1 root root 115080 Feb 27 13:56 /usr/java/jdk1.8.0_121/jre/lib/security/jssecacerts

=================================

 

 

Re: TLS configuration broke everything

In /var/log/cloudera-scm-server/cloudera-scm-server.log

I see a lot of communication messages in the form:

 

2017-02-28 16:25:18,565 INFO 1945274533@scm-web-61:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-SERVICEMON
ITOR-6b2c2df755e0afa7e53366dd3b5e840e' from 172.25.180.171
2017-02-28 16:25:18,771 INFO 1945274533@scm-web-61:com.cloudera.server.web.cmf.AuthenticationFailureEventListener: Authentication failure for user: '__cloudera_internal_user__mgmt-ACTIVITYMO
NITOR-6b2c2df755e0afa7e53366dd3b5e840e' from 172.25.180.171

Re: TLS configuration broke everything

There are also many erros in the form:

========

2017-02-28 12:13:10,314 WARN 1805290099@scm-web-2:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
2017-02-28 12:13:10,661 WARN 1805290099@scm-web-2:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
2017-02-28 12:13:10,692 WARN 1805290099@scm-web-2:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
2017-02-28 12:13:10,746 WARN 1805290099@scm-web-2:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
========

Are those really errors or just warnings that I am using my own certificates?

 

Re: TLS configuration broke everything

For each pair of hosts, what's the simplest way (probably outside of Hadoop) to test if SSL communication between them is working given their keystore and truststore?

 

For example, 

https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_log_support_case.html#concept...

mentions 

openssl s_client -connect host.fqdn.name:port

Is that the right tool to use? What port should be used? How does it now where to find keystore, trustore, password, etc?

Can one run it from CM server to agent and from agent to server?

 

 

Should not truststore on the CM server contain certificates from all the agents? From the instructions, it looks like it only contains root certificate. Is it enough to accept communication from clients?

 

If I send a diagnostic bundle to Cloudera, would that provide enough information for them to figure out what is wrong with TLS?

 

Re: TLS configuration broke everything

From each host that runs agent, I can run

 

openssl s_client -connect md01.rcc.local:7182

where md01 is CM server

and it returns a lot of information. In particular, closer to the end, it says:

 

SSL handshake has read 15850 bytes and written 206 bytes

So does it mean that the communication from agents to CM is working fine? Or should I look closer at the details of the message?

 

On the other hand, executing the same command from CM to agent hosts results in:

socket: Connection refused
connect:errno=111

Does it mean that there is something wrong either with the truststore on the agent or server certificate on CM? Or am I using the wrong port?

 

I can understand how truststore is found on the receiving end (although how JAVA_HOME is found?) but I do not understand how this command can find server key or truststore password. Or it does not use those?

Re: TLS configuration broke everything

Somehow I managed to make all the Hadoop services communicating (I think, the problem was that agent.key was only on CM node).

 

However, after enabling all TLS as described on the webpage, most of the Cloudera Management Services (Event Server, Host Monitor, etc.) are down.

 

I do have ssl.client.truststore.location and ssl.client.truststore.password configured.

Any ideas what can be wrong?

 

I have restarted CM server, agents, the whole Hadoop, Clouder Management Services several times and it did not help.

Re: TLS configuration broke everything

Should /opt/cloudera/security be owned by cloudera-scm:cloudera-scm? Any non-default permissions?

What password should be in /etc/cloudera-scm-agent/agentkey.pw? For truststore in JAVA_HOME/jre/lib/security/jssecacerts or for keystore in /opt/cloudera/security/pki/*.jks?

 

Re: TLS configuration broke everything

There are some log messages about write.lock:

=======================

 

017-03-01 10:29:05,796 ERROR com.cloudera.cmf.eventcatcher.server.EventCatcherService: Error starting EventServer
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/cloudera-scm-eventserver/v3/write.lock
       at org.apache.lucene.store.Lock.obtain(Lock.java:84)
       at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1112)
       at com.cloudera.cmf.eventcatcher.server.SingleIndexManager.makeIndexWriter(SingleIndexManager.java:139)
       at com.cloudera.cmf.eventcatcher.server.SingleIndexManager.<init>(SingleIndexManager.java:112)
       at com.cloudera.cmf.eventcatcher.server.EventCatcherService.<init>(EventCatcherService.java:282)
       at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:148)
2017-03-01 10:29:05,882 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM
2017-03-01 10:29:05,895 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
==============

There is some process using this lock, maybe I should kill it and restart CM and CM services?

==============

[root@md01 cloudera-scm-server]# fuser /var/lib/cloudera-scm-eventserver/v3/write.lock        
/var/lib/cloudera-scm-eventserver/v3/write.lock: 39285
[root@md01 cloudera-scm-server]# ps -ef | grep 39285
root      1512 28051  0 10:37 pts/1    00:00:00 grep --color=auto 39285
clouder+ 39285 42144 12 Feb28 ?        02:49:02 /usr/java/jdk1.8.0_121/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt-EVENTSERVER-md01.rcc.local.log.
out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Xms1073741824 -Xmx1073741824 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/mgmt_mgmt-EVENTSERVER-6b2c2df755e0afa7e53366d
d3b5e840e_pid39285.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -cp /run/cloudera-scm-agent/process/395-cloudera-mgmt-EVENTSERVER:/opt/mysql-connector-java-5.1.40
/mysql-connector-java-5.1.40-bin.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*: com.cloudera.cmf.eventcatcher.server.Even
tCatcherService

==============

Re: TLS configuration broke everything

Expert Contributor

certificate_unknown

 

This error is a fairly generic one but it often relates to a few different things. The certificate presented by the server is not trusted because the CA chain cannot be established, the root certificate is not available, or the certificate presented is expired.

 

The truststore that cloudera manager will use will depend on what precisely you have configured inside of cloudera Manager. In addition to that depending on the level of TLS you have selected you also need to properly configure each agent on each host to present a valid certificate and utilize a valid trusted ca chain. If you are seeing this error you should should review the contents of  your truststores, the contents of your certificates, and the configurations applied to Cloudera Manager as well as it's agents.

Customer Operations Engineer | Security SME | Cloudera, Inc.