Support Questions

Find answers, ask questions, and share your expertise

Ambari Metrics Service Check Failed

avatar
Contributor
2016-12-08 19:30:35,118 - Using hadoop conf dir: /usr/isdp/current/hadoop-client/conf
2016-12-08 19:30:35,120 - checked_call['hostid'] {}
2016-12-08 19:30:35,184 - checked_call returned (0, 'a8c0366f')
2016-12-08 19:30:35,184 - Ambari Metrics service check was started.
2016-12-08 19:30:35,188 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "bigdata002.istuary.com",
      "timestamp": 1481196635000,
      "starttime": 1481196635000,
      "metrics": {
        "1481196635000": 0.380131946063,
        "1481196636000": 1481196635000
      }
    }
  ]
}
2016-12-08 19:30:35,188 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
2016-12-08 19:30:50,232 - Connection failed. Next retry in 15 seconds.
2016-12-08 19:30:50,234 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "bigdata002.istuary.com",
      "timestamp": 1481196650000,
      "starttime": 1481196650000,
      "metrics": {
        "1481196650000": 0.380131946063,
        "1481196651000": 1481196650000
      }
    }
  ]
}
2016-12-08 19:30:50,234 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
2016-12-08 19:31:05,250 - Connection failed. Next retry in 15 seconds.
2016-12-08 19:31:05,251 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "bigdata002.istuary.com",
      "timestamp": 1481196665000,
      "starttime": 1481196665000,
      "metrics": {
        "1481196665000": 0.380131946063,
        "1481196666000": 1481196665000
      }
    }
  ]
}
2016-12-08 19:31:05,251 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
2016-12-08 19:31:20,266 - Connection failed. Next retry in 15 seconds.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 184, in <module>
    AMSServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 102, in service_check
    raise Fail("Metrics were not saved. Service check has failed. "
resource_management.core.exceptions.Fail: Metrics were not saved. Service check has failed. 
Connection failed.
1 ACCEPTED SOLUTION

avatar
Contributor

Thank you all, and I have fix the bug in my program. Because I cutom my stack but I do not change the stack_advisor.py that corresponding with the stack.

View solution in original post

11 REPLIES 11

avatar
Contributor

my ambari version is 2.4.1, but I can not find where comes wrong.

avatar

What is the output of the following command. Is that package shows same version in all the hosts.

rpm -qa | ambari-metrics

.

avatar

@Zhao Chaofeng

As per the code: https://github.com/apache/ambari/blob/release-2.4.1/ambari-server/src/main/resources/common-services...

Ambari will retry couple of times (AMS_CONNECT_TRIES=30) before showing that error. So can you pelase check that there is no connectivity issue and you are able to access it :

Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/

The service check script runs from one of the healthy node present in the cluster (which might not be ambari server always). So from that host where the service_check ran last time can you open the mentioned URL. It should be accessible "bigdata002.istuary.com:6188".

avatar
Contributor

Yes, you are right, but I deploy ambari on one host, so I can not find why there is some wrong with the network.

avatar

@Zhao Chaofeng

Is the following port opened on that host ? Or do we see any error in the AMS logs?

netstat -tnlpa | grep 6188

.

avatar
Contributor

6188 port is ok, AMS logs has some error, like this in /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out:

2016-12-08 19:50:34,611 [INFO] controller.py:56 - Running Controller thread: Thread-1 2016-12-08 19:50:34,611 [INFO] emitter.py:55 - Running Emitter thread: Thread-2 2016-12-08 19:50:34,611 [INFO] emitter.py:75 - Nothing to emit, resume waiting. 2016-12-08 19:51:34,614 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:34,614 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:51:39,614 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:39,614 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:51:44,615 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:44,615 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:49,616 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:49,616 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:54,616 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:54,617 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:59,618 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:59,618 [WARNING] emitter.py:90 - Retrying after 5 ...

others are ok.

avatar
Contributor

ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM

And in ambari-metrics-collector.log there is above error.

avatar

@Zhao Chaofeng

Looks like your AMS is getting down. Can you check if it is happening frequently. Sometimes we have seens that if it has insufficient memory then it is killed and restarted again.

avatar
Contributor

I know the reason, because 'timeline.metrics.service.webapp.address' in the ams-site.xml file is not be modified. Why?