Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SmartSense HST Agents Fail To Come Up

avatar
Explorer

Hello All,

I have a 10 node HDP 2.6 (.1.0) cluster, I have installed Hortonworks SmartSense with HST Server on master and HST Agents on all the nodes (master and slave1-9). Its a fresh HDP install on RHEL 7. SmartSense installation went good and after the install I only had HST Agents running on 3 slaves (Slaves 1,2 & 3). Even though I restart all HST Agents, they still are live only on the same 3 slaves (Slaves 1,2 & 3). Can anyone please help me in troubleshooting this issue?

Logs captured for the entire restart operation from /var/log/hst/hst-server.log

  INFO [main] CertificateManager:70 - Initialization of root certificate
  INFO [main] CertificateManager:72 - Certificate exists:true
  INFO [main] Configuration:562 - Reading password from existing file
  WARN [main] ConfigChangeListener:155 - Creating a patch
  INFO [main] ConfigChangeListener:236 - Patch created : /var/lib/smartsense/hst-server/updates/upload/config-update.tgz
  INFO [main] SupportToolServer:572 - Bundle Purge Scheduler enabled at :Thu Oct 19 13:30:21 EDT 2017. Bundle Purge job will run every 24 hrs.
  INFO [main] Server:266 - jetty-7.6.7.v20120910
  INFO [main] ContextHandler:744 - started o.e.j.s.ServletContextHandler{/,file:/usr/hdp/share/hst/hst-server/web/}
  INFO [main] AbstractConnector:338 - Started SelectChannelConnector@0.0.0.0:9000
  INFO [main] Server:266 - jetty-7.6.7.v20120910
  INFO [main] ContextHandler:744 - started o.e.j.s.ServletContextHandler{/,null}
  INFO [main] SslContextFactory:300 - Enabled Protocols [SSLv2Hello, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
  INFO [main] AbstractConnector:338 - Started SslSelectChannelConnector@0.0.0.0:9440
  INFO [main] SslContextFactory:300 - Enabled Protocols [SSLv2Hello, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
  INFO [main] AbstractConnector:338 - Started SslSelectChannelConnector@0.0.0.0:9441
  WARN [qtp943219925-85] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-86] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  INFO [qtp943219925-85] SupportToolResource:142 - Unregistering agent slave3
  INFO [qtp943219925-86] SupportToolResource:142 - Unregistering agent slave2
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  INFO [qtp943219925-85] CertificateManager:189 - Signing of agent certificate
  INFO [qtp943219925-85] CertificateManager:190 - Verifying passphrase
  INFO [qtp943219925-85] Configuration:562 - Reading password from existing file
  INFO [qtp943219925-85] CertificateManager:214 - Revoking of slave3 certificate.
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  INFO [qtp943219925-85] CertificateManager:265 - Command openssl ca -config /var/lib/smartsense/hst-server/keys/ca.config -keyfile /var/lib/smartsense/hst-server/keys/ca.key -revoke /var/lib/smartsense/hst-server/keys/slave3.crt -batch -passin pass:**** -cert /var/lib/smartsense/hst-server/keys/ca.crt was finished with exit code: 0 - the operation was completely successfully.
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-88] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  INFO [qtp943219925-85] CertificateManager:265 - Command openssl ca -config /var/lib/smartsense/hst-server/keys/ca.config -in /var/lib/smartsense/hst-server/keys/slave3.csr -out /var/lib/smartsense/hst-server/keys/slave3.crt -batch -md sha256 -passin pass:**** -keyfile /var/lib/smartsense/hst-server/keys/ca.key -cert /var/lib/smartsense/hst-server/keys/ca.crt was finished with exit code: 0 - the operation was completely successfully.
  INFO [qtp943219925-89] SupportToolResource:142 - Unregistering agent slave1
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-90] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-85] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  INFO [qtp943219925-90] SupportToolResource:115 - Registering agent, id=slave3, version=1.4.0.2.5.0.3-7
  WARN [qtp943219925-90] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-88] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-90] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-86] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  INFO [Thread-3] FileWatcher:232 - Watcher configuration has been changed. re-initializing watcher.
  INFO [Thread-3] ConfigChangeListener:131 - listner configuration has been changed. re-initializing listner.
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-87] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
  WARN [qtp943219925-88] nio:651 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Bharath A

Please check the output of the following command:

  rpm -qa python-devel


If you see the version of this package is 2.7.5-58 then you should downgrade to 2.7.5-48.

As an alternate solution please refer to: https://access.redhat.com/articles/2039753#controlling-certificate-verification-7

Or try to do the following as suggested in https://community.hortonworks.com/questions/120861/ambari-agent-ssl-certificate-verify-failed-certif...

sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg

.

View solution in original post

3 REPLIES 3

avatar
Explorer

Can you please give me some directions/your inputs, @Artem Ervits @Jay SenSharma @Shu @Aditya Sirna @Abdelkrim Hadjidj @Dinesh Chitlangia. Please need your help.

avatar
Master Mentor

@Bharath A

Please check the output of the following command:

  rpm -qa python-devel


If you see the version of this package is 2.7.5-58 then you should downgrade to 2.7.5-48.

As an alternate solution please refer to: https://access.redhat.com/articles/2039753#controlling-certificate-verification-7

Or try to do the following as suggested in https://community.hortonworks.com/questions/120861/ambari-agent-ssl-certificate-verify-failed-certif...

sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg

.

avatar
Master Mentor

@Bharath A

If above does not solve the issue then please try the following:

1. Check the hostname resolution on problematic hosts ?

# cat /etc/hosts
# hostname -

2. Try to delete/rename the "/usr/hdp/share/hst/hst-agent/keys" directory and then re-register hst agent as following:

# cd /usr/hdp/share/hst/hst-agent/
# mv keys keys_OLD


3. From ambari UI:

Ambari --> Hosts (Tab) --> Click on the problematic hostname --> SmartSense HST Agent -->  Register


4 Verify if the keys are generated fine:

# ls /usr/hdp/share/hst/hst-agent/


5 Now Start SmartSense HST Agent from Ambari

.