Hi,
Newbie here. Suddenly one of the nodes lost the heartbeat. Tried to restart ambari-agent and ambari-server. However, the error still persists. Here is the ambari-agent log.
WARNING 2020-02-11 15:24:08,318 base_alert.py:138 - [Alert][ranger_admin_password_check] Unable to execute alert. argument of type 'NoneType' is not iterable
INFO 2020-02-11 15:24:14,721 security.py:141 - Encountered communication error. Details: SSLError('The read operation timed out',)
ERROR 2020-02-11 15:24:14,721 Controller.py:226 - Unable to connect to: https://xxx1:8441/agent/v1/register/xxx2.com
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/ambari_agent/Controller.py", line 175, in registerWithServer
ret = self.sendRequest(self.registerUrl, data)
File "/usr/lib/ambari-agent/lib/ambari_agent/Controller.py", line 549, in sendRequest
raise IOError('Request to {0} failed due to {1}'.format(url, str(exception)))
IOError: Request to https://xxx1.com:8441/agent/v1/register/xxx2.com failed due to Error occured during connecting to the server: ('The read operation timed out',)
ERROR 2020-02-11 15:24:14,721 Controller.py:227 - Error:Request to https://xxx1.com:8441/agent/v1/register/xxx2.com failed due to Error occurred during connecting to the server: ('The read operation timed out',)
Note: Able to telnet manually port 8440 and 8441. All ports are listening also.
Thanks in advance.
Created on 02-12-2020 04:26 PM - edited 02-12-2020 04:28 PM
Thank you for the useful information that you've provided.
After doing some testing. I found out that there an issue with one of the network interfaces on the servers. By testing the jumbo frame connectivity. We remove the defective module and heartbeat lost has been resolved. Thank you for your assistance guys!.
Created 02-11-2020 03:14 PM
Hi @TR7_BRYLE ,
What is your Ambari version? You may want to check this knowledge article:
https://my.cloudera.com/knowledge/ERROR-quot-Request-to-https-AMBARI-SERVER-8441-agent-v1?id=273271
In case you can not access above, here are some details:
Cause:
This issue occurs when ethernet card or the switch does not support Jumbo frame, but the Jumbo frame (MTUSIZE=9000) is set in the network configuration.
To verify if the Jumbo frame is enabled, check the content of network interface configuration by running the following:
cat /etc/sysconfig/network-scripts/ifcfg-eth#
The Jumbo frame is enabled, if the following content (in bold) is displayed:
TYPE=Ethernet
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=xxx.xxx.xxx.xxx
NETMASK=xxx.xxx.xxx.xxx
MTUSIZE=9000
Instructions:
To resolve this issue, do the following for each node with the issue:
1. From /etc/sysconfig/network-scripts/ifcfg-eth#1, remove the following:
MTUSIZE=9000
2. Restart the network:
/etc/initd/network restart
3. Restart the ambari-agent:
ambari-agent restart
Thanks and hope this helps!
Li Wang, Technical Solution Manager
Created 02-11-2020 07:04 PM
Created 02-11-2020 05:40 PM
The error is actually due to timeout (and not because of port access)
SSLError('The read operation timed out',)
Above error indicates that communication further like reading a response is timing out. So we will have to first check why the "https" request is being timed out.
We can try using the following kind of simple Python script to simulate what agent actually tries. Ambari agent is a python utility which tries to connect to ambari server a d tries to register itself and sends heartbeat messages to ambari server.
So we can test the following script from the agent host to see if it is able to connect or if that is also getting timed out. We are using 'httplib' to test the access and Https communication.
# cat /tmp/SSL/ssl_test.py
import httplib
import ssl
if __name__ == "__main__":
ca_connection = httplib.HTTPSConnection('kerlatest1.example.com:8440', timeout=5, context=ssl._create_unverified_context())
ca_connection.request("GET", '/connection_info')
response = ca_connection.getresponse()
print response.status
data = response.read()
print str(data)
Run it like following:
# export PYTHONPATH=/usr/lib/ambari-agent/lib:/usr/lib/ambari-agent/lib/ambari_agent:$PYTHONPATH
# python /tmp/SSL/ssl_test.py
If above works fine and it returns 200 and returns result like following:
# python /tmp/SSL/ssl_test.py
200
{"security.server.two_way_ssl":"false"}
If you notice any HTTPS communitation or certificat related error then you might want to refer to the following article and according to your Ambari version please check if you have following defined in your ambari-agent.ini file "[security]" section?
[security]
force_https_protocol=PROTOCOL_TLSv1_2
- If you still face any issue then can you please share the "ambari-agent.log" freshly after restarting it ?
Reference Article:
Java/Python Updates and Ambari Agent TLS Settings
https://community.cloudera.com/t5/Community-Articles/Java-Python-Updates-and-Ambari-Agent-TLS-Settin...
.
.
Created 02-11-2020 07:08 PM
Created 02-11-2020 07:16 PM
Hi @jsensharma .
Another thing, I have already declared this on my ambari.ini file.
[security]
force_https_protocol=PROTOCOL_TLSv1_2
Thanks.
Created 02-11-2020 11:39 PM
As requested earlier
- If you still face any issue then can you please share the "ambari-agent.log" freshly after restarting it ?
Created on 02-12-2020 04:26 PM - edited 02-12-2020 04:28 PM
Thank you for the useful information that you've provided.
After doing some testing. I found out that there an issue with one of the network interfaces on the servers. By testing the jumbo frame connectivity. We remove the defective module and heartbeat lost has been resolved. Thank you for your assistance guys!.