Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari-agent heartbeat lost

avatar

Hello,


My ambari agent has lost its heartbeat. I tried restarting the agent/server starting stopping etc but nothing worked.

Im really desperate and dont know what to do. Any help would be greatly appreciated. I included the ambari-agent logs below.


ERROR 2019-08-02 10:51:04,424 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://<My server id was here>/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: <urlopen error [Errno 111] Connection refused>\n']

INFO 2019-08-02 10:51:04,509 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://<My server id was here>:10000/;transportMode=binary;auth=noSasl' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 60}

Result CRITICAL: ['Connection failed on host <My server id was here>:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 211, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 79, in check_thrift_port_sasl\n timeout=check_command_timeout)\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://<My server id was here>:10000/;transportMode=binary;auth=noSasl\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://<My server id was here>:10000/;transportMode=binary;auth=noSasl: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://<My server id was here>:10000/;transportMode=binary;auth=noSasl: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)']

INFO 2019-08-02 10:51:38,110 Controller.py:304 - Heartbeat (response id = 760) with server is running...

INFO 2019-08-02 10:51:38,111 Controller.py:311 - Building heartbeat message

INFO 2019-08-02 10:51:38,113 Heartbeat.py:90 - Adding host info/state to heartbeat message.

INFO 2019-08-02 10:51:38,197 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.

INFO 2019-08-02 10:51:38,197 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.

INFO 2019-08-02 10:51:38,455 Hardware.py:176 - Some mount points were ignored: /, /dev, /dev/shm, /run, /sys/fs/cgroup, /grid0, /run/user/1000

INFO 2019-08-02 10:51:38,459 Controller.py:320 - Sending Heartbeat (id = 760)

INFO 2019-08-02 10:51:38,464 Controller.py:332 - Heartbeat response received (id = 761)

INFO 2019-08-02 10:51:38,464 Controller.py:341 - Heartbeat interval is 10 seconds

INFO 2019-08-02 10:51:38,464 Controller.py:377 - Updating configurations from heartbeat

INFO 2019-08-02 10:51:38,465 Controller.py:386 - Adding cancel/execution commands

INFO 2019-08-02 10:51:38,465 Controller.py:403 - Adding recovery commands

INFO 2019-08-02 10:51:38,465 Controller.py:471 - Waiting 9.9 for next heartbeat

INFO 2019-08-02 10:51:48,365 Controller.py:478 - Wait for next heartbeat over


1 ACCEPTED SOLUTION

avatar
Master Mentor

@Matas Mockus

This is a duplicate posting I responded to the initial thread.

http://community.hortonworks.com/answers/249938/view.html

Did you check the response please either merge or delete this post as it will be difficult to follow the 2 threads!

View solution in original post

1 REPLY 1

avatar
Master Mentor

@Matas Mockus

This is a duplicate posting I responded to the initial thread.

http://community.hortonworks.com/answers/249938/view.html

Did you check the response please either merge or delete this post as it will be difficult to follow the 2 threads!