Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ambari Heart Beat lost

Highlighted

Ambari Heart Beat lost

New Contributor

hadoop3.jpg

Hello,

I have created cluster with 3 nodes and while restarting all services i am facing heart beat lost issue with one the 3 nodes. when i restart ambari-agent it came back to normal state but with in few minutes again it lost heart beat on the same server frequently i am facing this issue. could any help me out on this.

ambari-agent.log :

ERROR 2018-05-02 01:54:31,353 script_alert.py:119 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on hadoop3.server.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 198, in execute\n timeout=int(check_command_timeout) )\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call\n tries=tries, try_sleep=try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 287, in _call\n raise ExecuteTimeoutException(err_msg)\nExecuteTimeoutException: Execution of \'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c \'export PATH=\'"\'"\'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/sbin/:/usr/hdp/current/hive-metastore/bin\'"\'"\' ; export HIVE_CONF_DIR=\'"\'"\'/usr/hdp/current/hive-metastore/conf/conf.server\'"\'"\' ; hive --hiveconf hive.metastore.uris=thrift://hadoop3.server.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'"\'"\'show databases;\'"\'"\'\'\' was killed due timeout after 60 seconds\n)']

ambari-server.log :

02 May 2018 02:04:47,761 INFO [ambari-hearbeat-monitor] AmbariManagementControllerImpl:2331 - AmbariManagementControllerImpl.createHostAction: created ExecutionCommand for host hadoop3.server.com, role METRICS_COLLECTOR, roleCommand START, and command ID 1-0, with cluster-env tags version1

02 May 2018 02:05:01,389 INFO [ambari-client-thread-269] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:03,391 INFO [ambari-client-thread-334] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:03,485 INFO [ambari-client-thread-390] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:04,378 INFO [ambari-client-thread-269] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:05,412 INFO [ambari-client-thread-356] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:08,142 INFO [ambari-client-thread-334] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:11,394 INFO [ambari-client-thread-390] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:12,121 INFO [ambari-client-thread-356] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:12,599 INFO [pool-1-thread-1] AbstractProviderModule:424 - Metrics Collector Host or host component not live : hadoop3.server.com

02 May 2018 02:05:12,612 INFO [pool-1-thread-1] MetricsReportPropertyProvider:154 - METRICS_COLLECTOR is not live. Skip populating resources with metrics, next message will be logged after 1000 attempts.

hadoop3.jpg