Support Questions
Find answers, ask questions, and share your expertise

Getting Heartbeat lost with Exception in Ambari-2.4.2

Getting Below Trace in Ambari-agent.log

traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 165, in registerWithServer
    ret = self.sendRequest(self.registerUrl, data)
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 496, in sendRequest
    raise IOError('Request to {0} failed due to {1}'.format(url, str(exception)))
IOError: Request to https://lntpmn01.snapdot.com:8441/agent/v1/register/lntpdn03.snapdot.com failed due to Error occured during connecting to the server: ('The read operation timed out',)
ERROR 2017-07-19 16:10:19,383 Controller.py:213 - Error:Request to https://lntpmn01.snapdot.com:8441/agent/v1/register/lntpdn03.snapdot.com failed due to Error occured during connecting to the server: ('The read operation timed out',)

I have tried increasing the timeout in security.py script to 180. Still no luck.

SSL enabled ambari. No firewall on all the nodes. I can ping each other.

# telnet <ambari-server> 8441
successful !!!
# openssl s_client -connect <ambari-server>:8441
successful !!!

Please help me out.

1 ACCEPTED SOLUTION

Accepted Solutions

Super Mentor

@nshelke

There was an issue reported for ambari 2.4 (fixed in 2.4.0) but the stack looks similar. https://issues.apache.org/jira/browse/AMBARI-17991

Can you check if the workaround mentioned in the mentioned JIRA works for you, to edit the "/usr/lib/python2.6/site-packages/ambari_agent/security.py" file and increase the timeout to a larger value like 360 seconds.

def create_connection(self):
    if self.sock:
      self.sock.close()
    logger.info("SSL Connect being called.. connecting to the server")
    sock = socket.create_connection((self.host, self.port), 360)

.

360 or more to see of it is still getting times out?

Is it happening with all the ambari agents?

Have you tried running the Ambari Agent in Debug mode to extract more details?

.

View solution in original post

2 REPLIES 2

Super Mentor

@nshelke

There was an issue reported for ambari 2.4 (fixed in 2.4.0) but the stack looks similar. https://issues.apache.org/jira/browse/AMBARI-17991

Can you check if the workaround mentioned in the mentioned JIRA works for you, to edit the "/usr/lib/python2.6/site-packages/ambari_agent/security.py" file and increase the timeout to a larger value like 360 seconds.

def create_connection(self):
    if self.sock:
      self.sock.close()
    logger.info("SSL Connect being called.. connecting to the server")
    sock = socket.create_connection((self.host, self.port), 360)

.

360 or more to see of it is still getting times out?

Is it happening with all the ambari agents?

Have you tried running the Ambari Agent in Debug mode to extract more details?

.

View solution in original post

@Jay SenSharma

I even tried this to set to 300, But no luck.

I will try to set ambari-agent debug mode and will check the stack.