Created 04-24-2017 01:20 PM
Hello Ambari Gurus,
I am installing the HDP 2.6 stack and I face the below issue.
Ambari agents don't get registered with the ambari server . When looking at the agent log it shows the following error:
IOError: Request to https://ls5387v7.wdf.sap.corp:8441/agent/v1/register/ls5387v8.wdf.sap.corp failed due to EOF occurred in violation of protocol (_ssl.c:661)
ERROR 2017-04-21 23:19:51,277 Controller.py:227 - Error:Request to https://ls5387v7.wdf.sap.corp:8441/agent/v1/register/ls5387v8.wdf.sap.corp failed due to EOF occurred in violation of protocol (_ssl.c:661) WARNING 2017-04-21 23:19:51,277 Controller.py:228 - Sleeping for 25 seconds and then trying again
Registration log for the host shows the following:-
========================== Creating target directory... ========================== Command start time 2017-04-21 23:16:17 Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:17 ========================== Copying ambari sudo script... ========================== Command start time 2017-04-21 23:16:17 scp /var/lib/ambari-server/ambari-sudo.sh host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:17 ========================== Copying common functions script... ========================== Command start time 2017-04-21 23:16:17 scp /usr/lib/python2.6/site-packages/ambari_commons host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:18 ========================== Copying create-python-wrap script... ========================== Command start time 2017-04-21 23:16:18 scp /var/lib/ambari-server/create-python-wrap.sh host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:18 ========================== Copying OS type check script... ========================== Command start time 2017-04-21 23:16:18 scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:18 ========================== Running create-python-wrap script... ========================== Command start time 2017-04-21 23:16:18 Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:19 ========================== Running OS type check... ========================== Command start time 2017-04-21 23:16:19 Cluster primary/cluster OS family is suse12 and local/current OS family is suse12 Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:19 ========================== Checking 'sudo' package on remote host... ========================== Command start time 2017-04-21 23:16:19 Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:19 ========================== Copying repo file to 'tmp' folder... ========================== Command start time 2017-04-21 23:16:19 scp /etc/zypp/repos.d/ambari.repo host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:20 ========================== Moving file to repo dir... ========================== Command start time 2017-04-21 23:16:20 Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:20 ========================== Changing permissions for ambari.repo... ========================== Command start time 2017-04-21 23:16:20 Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:20 ========================== Copying setup script file... ========================== Command start time 2017-04-21 23:16:20 scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:21 ========================== Running setup agent script... ========================== Command start time 2017-04-21 23:16:21 ("ERROR 2017-04-21 23:16:43,609 Controller.py:227 - Error:Request to https://ls5387v7.wdf.sap.corp:8441/agent/v1/register/ls5387v8.wdf.sap.corp failed due to EOF occurred in violation of protocol (_ssl.c:661) WARNING 2017-04-21 23:16:43,610 Controller.py:228 - Sleeping for 13 seconds and then trying again INFO 2017-04-21 23:16:54,945 main.py:286 - Agent not going to die gracefully, going to execute kill -9 INFO 2017-04-21 23:16:54,974 ExitHelper.py:56 - Performing cleanup before exiting... INFO 2017-04-21 23:16:55,838 main.py:145 - loglevel=logging.INFO INFO 2017-04-21 23:16:55,838 main.py:145 - loglevel=logging.INFO INFO 2017-04-21 23:16:55,838 main.py:145 - loglevel=logging.INFO INFO 2017-04-21 23:16:55,841 DataCleaner.py:39 - Data cleanup thread started INFO 2017-04-21 23:16:55,843 DataCleaner.py:120 - Data cleanup started INFO 2017-04-21 23:16:55,843 DataCleaner.py:122 - Data cleanup finished INFO 2017-04-21 23:16:56,003 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-04-21 23:16:56,011 main.py:436 - Connecting to Ambari server at https://ls5387v7.wdf.sap.corp:8440 (10.21.24.138) INFO 2017-04-21 23:16:56,011 NetUtil.py:67 - Connecting to https://ls5387v7.wdf.sap.corp:8440/ca INFO 2017-04-21 23:16:56,148 main.py:446 - Connected to Ambari server ls5387v7.wdf.sap.corp INFO 2017-04-21 23:16:56,150 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-04-21 23:16:56,150 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-04-21 23:16:56,151 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x7f3865d85cd0>; currently running: False INFO 2017-04-21 23:16:58,171 hostname.py:98 - Read public hostname 'ls5387v8.wdf.sap.corp' using socket.getfqdn() INFO 2017-04-21 23:16:58,224 Hardware.py:174 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup INFO 2017-04-21 23:16:58,295 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources. ", None) ("ERROR 2017-04-21 23:16:43,609 Controller.py:227 - Error:Request to https://ls5387v7.wdf.sap.corp:8441/agent/v1/register/ls5387v8.wdf.sap.corp failed due to EOF occurred in violation of protocol (_ssl.c:661) WARNING 2017-04-21 23:16:43,610 Controller.py:228 - Sleeping for 13 seconds and then trying again INFO 2017-04-21 23:16:54,945 main.py:286 - Agent not going to die gracefully, going to execute kill -9 INFO 2017-04-21 23:16:54,974 ExitHelper.py:56 - Performing cleanup before exiting... INFO 2017-04-21 23:16:55,838 main.py:145 - loglevel=logging.INFO INFO 2017-04-21 23:16:55,838 main.py:145 - loglevel=logging.INFO INFO 2017-04-21 23:16:55,838 main.py:145 - loglevel=logging.INFO INFO 2017-04-21 23:16:55,841 DataCleaner.py:39 - Data cleanup thread started INFO 2017-04-21 23:16:55,843 DataCleaner.py:120 - Data cleanup started INFO 2017-04-21 23:16:55,843 DataCleaner.py:122 - Data cleanup finished INFO 2017-04-21 23:16:56,003 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-04-21 23:16:56,011 main.py:436 - Connecting to Ambari server at https://ls5387v7.wdf.sap.corp:8440 (10.21.24.138) INFO 2017-04-21 23:16:56,011 NetUtil.py:67 - Connecting to https://ls5387v7.wdf.sap.corp:8440/ca INFO 2017-04-21 23:16:56,148 main.py:446 - Connected to Ambari server ls5387v7.wdf.sap.corp INFO 2017-04-21 23:16:56,150 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-04-21 23:16:56,150 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-04-21 23:16:56,151 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x7f3865d85cd0>; currently running: False INFO 2017-04-21 23:16:58,171 hostname.py:98 - Read public hostname 'ls5387v8.wdf.sap.corp' using socket.getfqdn() INFO 2017-04-21 23:16:58,224 Hardware.py:174 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup INFO 2017-04-21 23:16:58,295 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources. ", None) Connection to ls5387v8.wdf.sap.corp closed. SSH command execution finished host=ls5387v8.wdf.sap.corp, exitcode=0 Command end time 2017-04-21 23:16:58 Registering with the server... Registration with the server failed. OK Licensed under the Apache License, Version 2.0. See third-party tools/resources that Ambari uses and their respective authors
Created 04-27-2017 05:14 PM
We have over come the problem by adding
following option to security section in ambari-agent.ini in all the hosts in the cluster:
[security] force_https_protocol=PROTOCOL_TLSv1_2
Created 04-24-2017 01:20 PM
Please note the openssl version is as follows:
openssl version
OpenSSL 1.0.1i-fips 6 Aug 2014
Created 04-24-2017 01:20 PM
Seems similar to issue
https://issues.apache.org/jira/browse/AMBARI-17991
but i am using ambari 2.5.0.3
Created 04-24-2017 01:21 PM
Please note it is ambari 2.5.0.3
INFO 2017-04-22 14:19:43,218 NetUtil.py:67 - Connecting to https://ls5387v7.wdf.sap.corp:8440/connection_info INFO 2017-04-22 14:19:43,322 security.py:93 - SSL Connect being called.. connecting to the server ERROR 2017-04-22 14:19:43,329 Controller.py:226 - Unable to connect to: https://ls5387v7.wdf.sap.corp:8440/connection_info Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 175, in registerWithServer ret = self.sendRequest(self.registerUrl, data) File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 545, in sendRequest raise IOError('Request to {0} failed due to {1}'.format(url, str(exception))) IOError: Request to https://ls5387v7.wdf.sap.corp:8440/connection_info failed due to EOF occurred in violation of protocol (_ssl.c:661) ERROR 2017-04-22 14:19:43,330 Controller.py:227 - Error:Request to https://ls5387v7.wdf.sap.corp:8440/connection_info failed due to EOF occurred in violation of protocol (_ssl.c:661) WARNING 2017-04-22 14:19:43,330 Controller.py:228 - Sleeping for 22 seconds and then trying again
Created 04-24-2017 07:47 PM
Are you running Ambari server on 2way SSL? by default Ambari communicates with Agents with one way SSL using 9440 but in your case Agent is trying to communicate on 9441 which is 2way SSL port.
If you are fine with one way SSL then you can check in Ambari "security.server.two_way_ssl = false" and disable 2way SSL.
On other side you can below Open SSL command and see what response server gives back.
1. Oneway SSL: openssl s_client -connect apappu5.hdp.com:8440
2. 2way SSL: openssl s_client -connect apappu5.hdp.com:8441
Created 04-24-2017 08:37 PM
How do I check if the ambari server is running 2way ssl?
Secondly do you really think this is the problem because i I look in the security guide :
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_security/bk_security.pdf
The below part tell hows to setup the 2 way ssl and which not enabled by default in my case as I don't have the parameter set.
2.5.4. Optional: Set Up Two-Way SSL Between Ambari Server and Ambari Agents
Created 04-24-2017 08:59 PM
Check if your Ambari server is configured with security.server.two_way_ssl=true in ambari.properties file.
Does you Agent's ambari-agent.ini shows like below or is it different in any way?
[server] secured_url_port = 8441 hostname = AMBARIHOST url_port = 8440
Also did you trying running "openssl s_client -connect AMBARIHOST:8440" ?
Created 04-24-2017 11:35 PM
I have checked that the Ambari server is not configured for 2 way ssl.
[server]
hostname=ls5387v7.XXX.XXX.corp
url_port=8440
secured_url_port=8441
connect_retry_delay=10
max_reconnect_retry_delay=30
Created 04-24-2017 11:35 PM
Ambari server is not configured for 2 way ssl.
[server] hostname=ls5387v7.XXX.XXX.corp
url_port=8440
secured_url_port=8441
connect_retry_delay=10
max_reconnect_retry_delay=30
ls5387v8:~ # openssl s_client -connect ls5387v7.XXX.XXX.corp:8440
CONNECTED(00000003) depth=0 C = AU, ST = Some-State, O = Internet Widgits Pty Ltd verify error:num=18:self signed certificate verify return:1 depth=0 C = AU, ST = Some-State, O = Internet Widgits Pty Ltd verify return:1 --- Certificate chain 0 s:/C=AU/ST=Some-State/O=Internet Widgits Pty Ltd i:/C=AU/ST=Some-State/O=Internet Widgits Pty Ltd --- Server certificate -----BEGIN CERTIFICATE----- MIIFpTCCA42gAwIBAgIBATANBgkqhkiG9w0BAQsFADBFMQswCQYDVQQGEwJBVTET MBEGA1UECAwKU29tZS1TdGF0ZTEhMB8GA1UECgwYSW50ZXJuZXQgV2lkZ2l0cyBQ dHkgTHRkMB4XDTE3MDQyMTE2MTgwOVoXDTE4MDQyMTE2MTgwOVowRTELMAkGA1UE BhMCQVUxEzARBgNVBAgMClNvbWUtU3RhdGUxITAfBgNVBAoMGEludGVybmV0IFdp ZGdpdHMgUHR5IEx0ZDCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBANWQ xlofKWsaR+FtclgHw2Z8fwFNESPdc2Q6l5OTXAkrA4E8gbYBeMySIS4wZIqCrvnt OmfKZxwGYD/D8YzzGTCBMjY93F/hO9UK5kQGMJp+G4261u9jG+8FfoVF8zFaYr53 +g7YR+l+CfR4to0ZqjYugjWPU02UUabpw3uMpM8HvCYnkyfhhl0qurleC7bll44g RptALAPwb4FLwmABhygbLAZV4gKHn0ONPhPON6zV2VA9iudUOZl4wi+jQGjjb5TX SiBqE3Kd9W0ND7t61pER+sla9ASH5OVWZEMVIjnQNIDJ5PHudpA34MiItoR/JaaP kicUCtoGx8OoCxNMofSB5kLFXH+fcuk7zZlQeeeLFn1qMzDWGBNrKfQKzCJchE6P OhBArBPk6hZOFLzeqNbYiyD/w7bnXdg7qUwkE+hyu6c0UmdMdqCsmoME/0dAVJOD poqcuq5DyyQmLluFwRKZ0zlUEkPvK9Ey4l5E18gc+JvcfTlSrNoHYJ/hqRQYMU8B VRMupECYm6pvqT1CZEHM996gGbrWXjLsgtdGPX1VM0uRwtlGePpvMY6W/HtQoket XWywiJsaDQWucIxxAh/0JbIiXm5v+bUlj7fYnSOk2i9HI/x/oZh+3zQY6VjLSucd s2eJH8u4bLazbY3rYB6wCkevtdiZ+IiDqxCOSOxZAgMBAAGjgZ8wgZwwHQYDVR0O BBYEFK9z9r1rnK9uDkiZD6jWnTCHxWPdMG0GA1UdIwRmMGSAFK9z9r1rnK9uDkiZ D6jWnTCHxWPdoUmkRzBFMQswCQYDVQQGEwJBVTETMBEGA1UECAwKU29tZS1TdGF0 ZTEhMB8GA1UECgwYSW50ZXJuZXQgV2lkZ2l0cyBQdHkgTHRkggEBMAwGA1UdEwQF MAMBAf8wDQYJKoZIhvcNAQELBQADggIBAMZgMZPsqgRWU8nWGMbQl6kPrjo758Yw QMDD+O1B0pD57BZqcDEAHAmP0v1Am6DcGyRvWzwhBzRoT8VeNJKdyROQGhMXPWbC /E5kvBX6VxaetII9VgyOIUjizC/HKdS24PVu8sK6y7h0CNmmtUJj4P25SaOY7g2y A1CIW8Jny2XJj4O8re3YiCfZn2TKzXHZJgWBiV5lVgczeuxBffLDsUU2txHxANlo RahS+3H6KwDFxfGXiuolu+lKdydXVy4jCqM97vNJGZ+tbB6RhoyuhCXd8lpW8xp7 BY3GmrMbIS/vFNoK+iVHpcxt6AfIJqUZ8KW97SqfZTXymIYzGva8/7XY0tNYIh1i Hr3hC+3GoFSpfDSjLIu2i6+3vUIaykAdO01zJ9ccYYoLY7G6rHz4ErjWTu9Nh51u olE4QgDlW19lMgTIOZk6a/jPYq6zc4iAppTqMXdvHUB3W96ceDoeMq+0P2J3UrI6 11OJUrNBvxEQgrYWgH83au1v1u8rYxo+IA0jQsBVaMeOQTShOSttuGsNv/zhjSf6 0wLK09qmayuZddZhJTEHwEpJ4OdQVNnvzO/e9QYnzxqa3XU/rrZ9xihNlU+1YZt5 0vSTjuhD4ylFpR9JhmX1VB/DTbDS0trfdH1VPhAMKYr/v4GkGTtn8eHe3vwmBH9k 79jAP0ApRatK -----END CERTIFICATE----- subject=/C=AU/ST=Some-State/O=Internet Widgits Pty Ltd issuer=/C=AU/ST=Some-State/O=Internet Widgits Pty Ltd --- No client certificate CA names sent --- SSL handshake has read 2257 bytes and written 455 bytes --- New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384 Server public key is 4096 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES256-GCM-SHA384 Session-ID: 58FE6DA17EFA5278E0381D826F3E7E7E3F6558A6D4683964ACFDF4B4C63AD632 Session-ID-ctx: Master-Key: C0EEC8877A651977C8F5B6FCC78B4FD977DDA0A7BF06203DE433D04EC4B45A1788F8802B7F47AF58C210C321DD9BD225 Key-Arg : None PSK identity: None PSK identity hint: None SRP username: None Start Time: 1493069217 Timeout : 300 (sec) Verify return code: 18 (self signed certificate) ---
Created 04-25-2017 12:14 AM
That means server is listening on 8440 port and Agent should be able to communicate with out any issues. please see if there is any errors in ambari server logs?