Reply
Cloudera Employee
Posts: 129
Registered: ‎01-15-2015

Re: After enabling TLS cloudera agent heartbeat failing

Please note that keystores/truststores need to be in standard JCEKS format and the documentation does not state otherwise. PKCS12 is only used for exporting and converting certificate plus private key to PEM format for CM agent config. Can you please point to the sentence suggesting PKCS12 format so that we can correct it?
Explorer
Posts: 20
Registered: ‎12-27-2018

Re: After enabling TLS cloudera agent heartbeat failing

Hi,

 

Here is my response to your questions, can you please correct me what I am doing wrong. Also if you need some more details, I should be able to share.

 

Thanks,

Tulasi

 

1.) Ensure that the certificates are in a standard x509 format for the agent.
Yes it is standard x509, see my response to Bgooley

2.) Ensure that the truststores/keystores on the CM host are in JCEKS format and not pkcs12.
As per cloudera document, it should be JCEKS. From the link  
https://www.cloudera.com/documentation/enterprise/5-15-x/topics/how_to_configure_cm_tls.html
section "Generate TLS Certificate", point 3

3.) Make sure that the cloudera-scm user can read the Private Key, Certificates, Truststores, and Password Files.
Yes, see my response to gzigldrum

4.) Make sure that the certificate on the failing agent contains a proper CN and DNS Alt Name if Alt Names are in use.
Yes, I have verified this as well

5.) Are you using self-signed certificates or certificates signed by a CA?
I am using self signed certificate

6.) If all else fails you can obtain a tcpdump of attempted communication with the server. The port that we normally heartbeat to is 7182. You can then review the conversation between the server and agent to attempt to identify at what point the error is returned and potentially what error is being observed at the protocol level. You can identify and restrict your tcpdump information by tcp.stream.

[root@optim-rhel72-uppu ~]# tcpdump -i any 'port 7182'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
21:20:03.562131 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [S], seq 3560294529, win 43690, options [mss 65495,sackOK,TS val 1632415805 ecr 0,nop,wscale 7], length 0
21:20:03.562225 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [.], ack 1, win 342, options [nop,nop,TS val 1632415805 ecr 1632415805], length 0
21:20:03.562549 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [P.], seq 1:254, ack 1, win 342, options [nop,nop,TS val 1632415806 ecr 1632415805], length 253
21:20:03.587871 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [.], ack 16390, win 1365, options [nop,nop,TS val 1632415831 ecr 1632415831], length 0
21:20:03.587919 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [.], ack 21184, win 2388, options [nop,nop,TS val 1632415831 ecr 1632415831], length 0
21:20:03.619895 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [P.], seq 254:516, ack 21184, win 2388, options [nop,nop,TS val 1632415863 ecr 1632415831], length 262
21:20:03.628945 IP optim-rhel72-uppu.development.unicomglobal.software.44942 > optim-rhel72-uppu.development.unicomglobal.software.7182: Flags [F.], seq 516, ack 21185, win 2388, options [nop,nop,TS val 1632415872 ecr 1632415864], length 0

Cloudera Employee
Posts: 129
Registered: ‎01-15-2015

Re: After enabling TLS cloudera agent heartbeat failing

In addition to that, can you please show us the CM agent configuration with

# egrep -v '^[[:blank:]]*#|^$' /etc/cloudera-scm-agent/config.ini
Explorer
Posts: 20
Registered: ‎12-27-2018

Re: After enabling TLS cloudera agent heartbeat failing

Here is the output:

 

[root@optim-rhel72-uppu ~]# egrep -v '^[[:blank:]]*#|^$' /etc/cloudera-scm-agent/config.ini
[General]
server_host=optim-rhel72-uppu.development.unicomglobal.software
server_port=7182
max_collection_wait_seconds=10.0
metrics_url_timeout_seconds=30.0
task_metrics_timeout_seconds=5.0
monitored_nodev_filesystem_types=nfs,nfs4,tmpfs
local_filesystem_whitelist=ext2,ext3,ext4,xfs
impala_profile_bundle_max_bytes=1073741824
stacks_log_bundle_max_bytes=1073741824
stacks_log_max_uncompressed_file_size_bytes=5242880
orphan_process_dir_staleness_threshold=5184000
orphan_process_dir_refresh_interval=3600
scm_debug=INFO
dns_resolution_collection_interval_seconds=60
dns_resolution_collection_timeout_seconds=30
[Security]
use_tls=1
max_cert_depth=9
 verify_cert_file=/opt/cloudera/security/pki/optim-rhel72-uppu.pem
 verify_cert_dir=/opt/cloudera/security/pki
 client_key_file=/opt/cloudera/security/pki/agent.key
 client_keypw_file=/etc/cloudera-scm-agent/agentkey.pw
 client_cert_file=/opt/cloudera/security/pki/agent.pem
[Hadoop]
[Cloudera]
[JDBC]
[root@optim-rhel72-uppu ~]#

 

Posts: 1,002
Topics: 1
Kudos: 249
Solutions: 126
Registered: ‎04-22-2014

Re: After enabling TLS cloudera agent heartbeat failing

@Tulasi,

 

Thank you for providing your config.  It appears you have space characters at the beginning of your cert/key configs.  Remove the space characters form the beginning of the following lines and then restart the agent:


 verify_cert_file=/opt/cloudera/security/pki/optim-rhel72-uppu.pem
 verify_cert_dir=/opt/cloudera/security/pki
 client_key_file=/opt/cloudera/security/pki/agent.key
 client_keypw_file=/etc/cloudera-scm-agent/agentkey.pw
 client_cert_file=/opt/cloudera/security/pki/agent.pem

Posts: 1,002
Topics: 1
Kudos: 249
Solutions: 126
Registered: ‎04-22-2014

Re: After enabling TLS cloudera agent heartbeat failing

NOTE:

 

You should only specify verify_cert_dir OR verify_cert_file, not both

Since you have a pem file, I would suggest using verify_cert_file and commenting out "verify_cert_dir=/opt/cloudera/security/pki"

Explorer
Posts: 20
Registered: ‎12-27-2018

Re: After enabling TLS cloudera agent heartbeat failing

@bgooley

space characters at the bginning of cert/key in the agent configuration file is created this problem. After removing all of those spaces, restarted agent worked.

 

I didn't expect a space can create this sort of problem without telling what is going wrong.

 

Thanks for helping to figure this silly problem.

 

 

Explorer
Posts: 20
Registered: ‎12-27-2018

Re: After enabling TLS cloudera agent heartbeat failing

I have followed the steps under "Configuring TLS/SSL for HDFS, YARN and MapReduce"

 

Service did not start successfully; not all of the required roles started: only 0/1 roles started. Reasons : Service has only 0 NodeManager roles running instead of minimum required 1

 

YARN failing to start and I see below error in the log

Can't open /run/cloudera-scm-agent/process/190-yarn-NODEMANAGER/container-executor.cfg: Permission denied

 

This is the permission:

-rw-r----- 1 yarn hadoop   997 Jan 18 02:22 creds.localjceks
-rw------- 1 yarn hadoop  1746 Jan 18 02:22 yarn.keytab
-r-------- 1 root hadoop   156 Jan 18 02:22 container-executor.cfg
-rw------- 1 root root    3688 Jan 18 02:22 supervisor.conf

 

But after giving permission, restart creates another foldr with the same permission, how to resolve this problem.

 

Thanks,

Tulasi

 

Posts: 1,002
Topics: 1
Kudos: 249
Solutions: 126
Registered: ‎04-22-2014

Re: After enabling TLS cloudera agent heartbeat failing

I opened a Jira internally at Cloudera to ask that config.ini leading non-word characters be trimmed.

 

Regards,

 

Ben

Posts: 1,002
Topics: 1
Kudos: 249
Solutions: 126
Registered: ‎04-22-2014

Re: After enabling TLS cloudera agent heartbeat failing

@Tulasi,

 

Could you start a new thread with your new issue so that we don't mix issues in the same thread.  the space character issue is likely to help others, so it would be good to start a new thread for permission denied issue.  I think it is a known one, but it will be easier to discuss if we can start fresh.

Announcements