Created 10-19-2017 06:32 PM
Hello All,
I have a 10 node HDP 2.6 (.1.0) cluster, I have installed Hortonworks SmartSense with HST Server on master and HST Agents on all the nodes (master and slave1-9). Its a fresh HDP install on RHEL 7. SmartSense installation went good and after the install I only had HST Agents running on 3 slaves (Slaves 1,2 & 3). Even though I restart all HST Agents, they still are live only on the same 3 slaves (Slaves 1,2 & 3). Can anyone please help me in troubleshooting this issue?
Logs captured for the entire restart operation from /var/log/hst/hst-server.log
INFO [main] CertificateManager:70 - Initialization of root certificate INFO [main] CertificateManager:72 - Certificate exists:true INFO [main] Configuration:562 - Reading password from existing file WARN [main] ConfigChangeListener:155 - Creating a patch INFO [main] ConfigChangeListener:236 - Patch created : /var/lib/smartsense/hst-server/updates/upload/config-update.tgz INFO [main] SupportToolServer:572 - Bundle Purge Scheduler enabled at :Thu Oct 19 13:30:21 EDT 2017. Bundle Purge job will run every 24 hrs. INFO [main] Server:266 - jetty-7.6.7.v20120910 INFO [main] ContextHandler:744 - started o.e.j.s.ServletContextHandler{/,file:/usr/hdp/share/hst/hst-server/web/} INFO [main] AbstractConnector:338 - Started SelectChannelConnector@0.0.0.0:9000 INFO [main] Server:266 - jetty-7.6.7.v20120910 INFO [main] ContextHandler:744 - started o.e.j.s.ServletContextHandler{/,null} INFO [main] SslContextFactory:300 - Enabled Protocols [SSLv2Hello, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2] INFO [main] AbstractConnector:338 - Started SslSelectChannelConnector@0.0.0.0:9440 INFO [main] SslContextFactory:300 - Enabled Protocols [SSLv2Hello, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2] INFO [main] AbstractConnector:338 - Started SslSelectChannelConnector@0.0.0.0:9441 WARN [qtp943219925-85] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-86] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca INFO [qtp943219925-85] SupportToolResource:142 - Unregistering agent slave3 INFO [qtp943219925-86] SupportToolResource:142 - Unregistering agent slave2 WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca INFO [qtp943219925-85] CertificateManager:189 - Signing of agent certificate INFO [qtp943219925-85] CertificateManager:190 - Verifying passphrase INFO [qtp943219925-85] Configuration:562 - Reading password from existing file INFO [qtp943219925-85] CertificateManager:214 - Revoking of slave3 certificate. WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca INFO [qtp943219925-85] CertificateManager:265 - Command openssl ca -config /var/lib/smartsense/hst-server/keys/ca.config -keyfile /var/lib/smartsense/hst-server/keys/ca.key -revoke /var/lib/smartsense/hst-server/keys/slave3.crt -batch -passin pass:**** -cert /var/lib/smartsense/hst-server/keys/ca.crt was finished with exit code: 0 - the operation was completely successfully. WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-88] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca INFO [qtp943219925-85] CertificateManager:265 - Command openssl ca -config /var/lib/smartsense/hst-server/keys/ca.config -in /var/lib/smartsense/hst-server/keys/slave3.csr -out /var/lib/smartsense/hst-server/keys/slave3.crt -batch -md sha256 -passin pass:**** -keyfile /var/lib/smartsense/hst-server/keys/ca.key -cert /var/lib/smartsense/hst-server/keys/ca.crt was finished with exit code: 0 - the operation was completely successfully. INFO [qtp943219925-89] SupportToolResource:142 - Unregistering agent slave1 WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-90] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-85] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca INFO [qtp943219925-90] SupportToolResource:115 - Registering agent, id=slave3, version=1.4.0.2.5.0.3-7 WARN [qtp943219925-90] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-88] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-90] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-86] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca INFO [Thread-3] FileWatcher:232 - Watcher configuration has been changed. re-initializing watcher. INFO [Thread-3] ConfigChangeListener:131 - listner configuration has been changed. re-initializing listner. WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca WARN [qtp943219925-88] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
Created 10-23-2017 03:29 PM
Please check the output of the following command:
rpm -qa python-devel
If you see the version of this package is 2.7.5-58 then you should downgrade to 2.7.5-48.
As an alternate solution please refer to: https://access.redhat.com/articles/2039753#controlling-certificate-verification-7
Or try to do the following as suggested in https://community.hortonworks.com/questions/120861/ambari-agent-ssl-certificate-verify-failed-certif...
sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg
.
Created 10-23-2017 03:06 PM
Can you please give me some directions/your inputs, @Artem Ervits @Jay SenSharma @Shu @Aditya Sirna @Abdelkrim Hadjidj @Dinesh Chitlangia. Please need your help.
Created 10-23-2017 03:29 PM
Please check the output of the following command:
rpm -qa python-devel
If you see the version of this package is 2.7.5-58 then you should downgrade to 2.7.5-48.
As an alternate solution please refer to: https://access.redhat.com/articles/2039753#controlling-certificate-verification-7
Or try to do the following as suggested in https://community.hortonworks.com/questions/120861/ambari-agent-ssl-certificate-verify-failed-certif...
sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg
.
Created 10-23-2017 03:39 PM
If above does not solve the issue then please try the following:
1. Check the hostname resolution on problematic hosts ?
# cat /etc/hosts # hostname -
2. Try to delete/rename the "/usr/hdp/share/hst/hst-agent/keys" directory and then re-register hst agent as following:
# cd /usr/hdp/share/hst/hst-agent/ # mv keys keys_OLD
3. From ambari UI:
Ambari --> Hosts (Tab) --> Click on the problematic hostname --> SmartSense HST Agent --> Register
4 Verify if the keys are generated fine:
# ls /usr/hdp/share/hst/hst-agent/
5 Now Start SmartSense HST Agent from Ambari
.