Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Getting Heartbeat lost with Exception in Ambari-2.4.2

avatar

Getting Below Trace in Ambari-agent.log

traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 165, in registerWithServer
    ret = self.sendRequest(self.registerUrl, data)
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 496, in sendRequest
    raise IOError('Request to {0} failed due to {1}'.format(url, str(exception)))
IOError: Request to https://lntpmn01.snapdot.com:8441/agent/v1/register/lntpdn03.snapdot.com failed due to Error occured during connecting to the server: ('The read operation timed out',)
ERROR 2017-07-19 16:10:19,383 Controller.py:213 - Error:Request to https://lntpmn01.snapdot.com:8441/agent/v1/register/lntpdn03.snapdot.com failed due to Error occured during connecting to the server: ('The read operation timed out',)

I have tried increasing the timeout in security.py script to 180. Still no luck.

SSL enabled ambari. No firewall on all the nodes. I can ping each other.

# telnet <ambari-server> 8441
successful !!!
# openssl s_client -connect <ambari-server>:8441
successful !!!

Please help me out.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@nshelke

There was an issue reported for ambari 2.4 (fixed in 2.4.0) but the stack looks similar. https://issues.apache.org/jira/browse/AMBARI-17991

Can you check if the workaround mentioned in the mentioned JIRA works for you, to edit the "/usr/lib/python2.6/site-packages/ambari_agent/security.py" file and increase the timeout to a larger value like 360 seconds.

def create_connection(self):
    if self.sock:
      self.sock.close()
    logger.info("SSL Connect being called.. connecting to the server")
    sock = socket.create_connection((self.host, self.port), 360)

.

360 or more to see of it is still getting times out?

Is it happening with all the ambari agents?

Have you tried running the Ambari Agent in Debug mode to extract more details?

.

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@nshelke

There was an issue reported for ambari 2.4 (fixed in 2.4.0) but the stack looks similar. https://issues.apache.org/jira/browse/AMBARI-17991

Can you check if the workaround mentioned in the mentioned JIRA works for you, to edit the "/usr/lib/python2.6/site-packages/ambari_agent/security.py" file and increase the timeout to a larger value like 360 seconds.

def create_connection(self):
    if self.sock:
      self.sock.close()
    logger.info("SSL Connect being called.. connecting to the server")
    sock = socket.create_connection((self.host, self.port), 360)

.

360 or more to see of it is still getting times out?

Is it happening with all the ambari agents?

Have you tried running the Ambari Agent in Debug mode to extract more details?

.

avatar

@Jay SenSharma

I even tried this to set to 300, But no luck.

I will try to set ambari-agent debug mode and will check the stack.