Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ambari Agent doesn't start , Heartbeat lost and No logs logged

Highlighted

Ambari Agent doesn't start , Heartbeat lost and No logs logged

New Contributor

Unable to start Ambari Agent. I'm getting heartbeat lost for all the services on the server. Since it is Primary namenode. Couldn't identify the status of the services on the server.When I fire ambari-agent start/restart . It started and stopped suddenly .However when I grep ambari in running process but it is actually not running. How can I start ambari agent ..

root 2970771 1 0 Nov08 ? 00:00:00 /usr/bin/python2.6 /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start root 2970779 2970771 0 Nov08 ? 00:21:24 /usr/bin/python2.6 /usr/lib/python2.6/site-packages/ambari_agent/main.py start

Symptoms:

Using version Python 2.6

Logs didn't say anything other than this actually stopped logging .

ValueError: Unknown format code 'd' for object of type 'float'
INFO 2017-11-10 15:45:48,904 DataCleaner.py:120 - Data cleanup started
INFO 2017-11-10 15:45:48,908 DataCleaner.py:122 - Data cleanup finished
WARNING 2017-11-10 15:46:42,230 base_alert.py:140 - [Alert][ams_metrics_monitor_process] Unable to execute alert. Unable to find 'AMBARI_METRICS/package
/alerts/alert_ambari_metrics_monitor.py' as an absolute path or part of /var/lib/ambari-agent/cache/stacks or /var/lib/ambari-agent/cache/host_scripts
WARNING 2017-11-10 15:47:42,220 base_alert.py:140 - [Alert][ams_metrics_monitor_process] Unable to execute alert. Unable to find 'AMBARI_METRICS/package
/alerts/alert_ambari_metrics_monitor.py' as an absolute path or part of /var/lib/ambari-agent/cache/stacks o
r /var/lib/ambari-agent/cache/host_scripts
ERROR 2017-11-10 15:47:42,428 scheduler.py:520 - Job "452de60e-d34c-41d8-9748-bcff4784ebe2 (trigger: interval[0:02:00], next run at: 2017-11-10 15:49:42
.210824)" raised an exception
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
    return lambda: alert_def.collect()
  File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
    data['text'] = res_base_text.format(*res[1])
ValueError: Unknown format code 'd' for object of type 'float'
File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
    return lambda: alert_def.collect()
  File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
    data['text'] = res_base_text.format(*res[1])
ValueError: Unknown format code 'd' for object of type 'float'
WARNING 2017-11-11 11:52:42,221 base_alert.py:140 - [Alert][ams_metrics_monitor_process] Unable to execute alert. Unable to find 'AMBARI_METRICS/package/alerts/alert_ambari_metrics_monitor.py' as an absolute path or part of /var/lib/ambari-agent/cache/stacks or /var/lib/ambari-agent/cache/host_scripts
WARNING 2017-11-11 11:53:42,220 base_alert.py:140 - [Alert][ams_metrics_monitor_process] Unable to execute alert. Unable to find 'AMBARI_METRICS/package/alerts/alert_ambari_metrics_monitor.py' as an absolute path or part of /var/lib/ambari-agent/cache/stacks or /var/lib/ambari-agent/cache/host_scripts
ERROR 2017-11-11 11:53:42,416 scheduler.py:520 - Job "452de60e-d34c-41d8-9748-bcff4784ebe2 (trigger: interval[0:02:00], next run at: 2017-11-11 11:55:42.210824)" raised an exception
Traceback (most recent call last):


@Jay Kumar SenSharma . Please any idea on this .

6 REPLIES 6

Re: Ambari Agent doesn't start , Heartbeat lost and No logs logged

Super Mentor

@Raj ji

The following error indicates that some Alert definition value seems to be recently changed and specially the float value value is not correct.

  File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect    data['text'] = res_base_text.format(*res[1])ValueError: Unknown format code 'd' for object of type 'float'

.

So please let us know which alert definition have you changed recently? If you have made any changes then can you please revert it back? (is it "Ambari metrics Monitor Process" Alert definition that you changed recently?)

Which version of ambari are you using?

Can you please share the ambari-server.log as well?.

Also can you please verify the path of "alert_ambari_metrics_monitor.py" file inside "/var/lib/ambari-agent" directory on working agent host and compare the path if it exist on the Non working ambari-agent host?

# cd "/var/lib/ambari-agent"
# find . -name "alert_ambari_metrics_monitor.py"

.

Re: Ambari Agent doesn't start , Heartbeat lost and No logs logged

New Contributor

Yes . The changes made to the Amabri Metrics are because we were unable to start the Ambari metrics few days back. Hence we changed the port numbers as below.

timeline.metrics.service.webapp.address--> 0.0.0.0:7188 and hbase.zookeeper.property.clientPort --> 2181 from 61181 . It is distributed environment.

Ambari Version 2.1.0

Hostname changed <affected host>

11 Nov 2017 04:18:30,378  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:154 - Heartbeat lost from host <affected host>
11 Nov 2017 04:18:30,379  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_MONITOR on <affected host>
11 Nov 2017 04:18:30,379  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component FLUME_HANDLER on <affected host>
11 Nov 2017 04:18:30,379  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_REST_SERVER on <affected host>
11 Nov 2017 04:18:30,379  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_MASTER on <affected host>
11 Nov 2017 04:18:30,379  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZKFC on <affected host>
11 Nov 2017 04:18:30,380  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NAMENODE on <affected host>
11 Nov 2017 04:18:30,380  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on <affected host>
11 Nov 2017 04:18:53,910  INFO [AlertNoticeDispatchService] AlertNoticeDispatchService:279 - There are 5 pending alert notices about to be dispatched...
11 Nov 2017 04:18:54,107  INFO [alert-dispatch-32] EmailDispatcher:88 - Sending email: XXXXXXXXXXXXXXXXXXXXXXX

11 Nov 2017 12:40:53,970 ERROR [qtp-client-6767] MetricsPropertyProvider:185 - Error getting timeline metrics. Can not connect to collector, socket error.
11 Nov 2017 12:41:03,981 ERROR [qtp-client-6767] MetricsPropertyProvider:185 - Error getting timeline metrics. Can not connect to collector, socket error.

ERROR [qtp-client-3412] MetricsReportPropertyProvider:223 - Error getting timeline metrics. Can not connect to collector, socket error.

The alert_ambari_metrics_monitor.py is same path for both working ambari-agent host and non ambari-agent host are same .

Re: Ambari Agent doesn't start , Heartbeat lost and No logs logged

New Contributor

@Jay Kumar SenSharma, Sir . Do we need any more info to proceed on this . can you please help me.

Re: Ambari Agent doesn't start , Heartbeat lost and No logs logged

New Contributor

can understand , fuser is taking long time to respond to ambari agent. What is the fix other than restart a server.any idea

Re: Ambari Agent doesn't start , Heartbeat lost and No logs logged

Super Mentor

@Raj ji


Ambari agent exclusively uses the "fuser" functionality by default and by design. Ambari agent uses "fuser" to check if the ambari-agent port is occupied by another process or not. Sometimes the "fuser" which is a OS command causes hanging indefinitely.

If you are observing that the ambari-agent is not starting due to port check with 'fuser tcp 8670' and hanging on the host, then the solution will be to clear these processes and recover by doing a host reboot on the affected nodes.

.

Re: Ambari Agent doesn't start , Heartbeat lost and No logs logged

New Contributor

@Jay Kumar SenSharma

Do we need to clear these processes and do a host reboot without stopping the services on that host?

or, Do we need to manually stop the services on that host before rebooting the host?

Please advise.