Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Ambari Metrics Service Check Failed

Contributor
2016-12-08 19:30:35,118 - Using hadoop conf dir: /usr/isdp/current/hadoop-client/conf
2016-12-08 19:30:35,120 - checked_call['hostid'] {}
2016-12-08 19:30:35,184 - checked_call returned (0, 'a8c0366f')
2016-12-08 19:30:35,184 - Ambari Metrics service check was started.
2016-12-08 19:30:35,188 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "bigdata002.istuary.com",
      "timestamp": 1481196635000,
      "starttime": 1481196635000,
      "metrics": {
        "1481196635000": 0.380131946063,
        "1481196636000": 1481196635000
      }
    }
  ]
}
2016-12-08 19:30:35,188 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
2016-12-08 19:30:50,232 - Connection failed. Next retry in 15 seconds.
2016-12-08 19:30:50,234 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "bigdata002.istuary.com",
      "timestamp": 1481196650000,
      "starttime": 1481196650000,
      "metrics": {
        "1481196650000": 0.380131946063,
        "1481196651000": 1481196650000
      }
    }
  ]
}
2016-12-08 19:30:50,234 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
2016-12-08 19:31:05,250 - Connection failed. Next retry in 15 seconds.
2016-12-08 19:31:05,251 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "bigdata002.istuary.com",
      "timestamp": 1481196665000,
      "starttime": 1481196665000,
      "metrics": {
        "1481196665000": 0.380131946063,
        "1481196666000": 1481196665000
      }
    }
  ]
}
2016-12-08 19:31:05,251 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
2016-12-08 19:31:20,266 - Connection failed. Next retry in 15 seconds.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 184, in <module>
    AMSServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 102, in service_check
    raise Fail("Metrics were not saved. Service check has failed. "
resource_management.core.exceptions.Fail: Metrics were not saved. Service check has failed. 
Connection failed.
1 ACCEPTED SOLUTION

Contributor

Thank you all, and I have fix the bug in my program. Because I cutom my stack but I do not change the stack_advisor.py that corresponding with the stack.

View solution in original post

11 REPLIES 11

Contributor

my ambari version is 2.4.1, but I can not find where comes wrong.

What is the output of the following command. Is that package shows same version in all the hosts.

rpm -qa | ambari-metrics

.

@Zhao Chaofeng

As per the code: https://github.com/apache/ambari/blob/release-2.4.1/ambari-server/src/main/resources/common-services...

Ambari will retry couple of times (AMS_CONNECT_TRIES=30) before showing that error. So can you pelase check that there is no connectivity issue and you are able to access it :

Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/

The service check script runs from one of the healthy node present in the cluster (which might not be ambari server always). So from that host where the service_check ran last time can you open the mentioned URL. It should be accessible "bigdata002.istuary.com:6188".

Contributor

Yes, you are right, but I deploy ambari on one host, so I can not find why there is some wrong with the network.

@Zhao Chaofeng

Is the following port opened on that host ? Or do we see any error in the AMS logs?

netstat -tnlpa | grep 6188

.

Contributor

6188 port is ok, AMS logs has some error, like this in /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out:

2016-12-08 19:50:34,611 [INFO] controller.py:56 - Running Controller thread: Thread-1 2016-12-08 19:50:34,611 [INFO] emitter.py:55 - Running Emitter thread: Thread-2 2016-12-08 19:50:34,611 [INFO] emitter.py:75 - Nothing to emit, resume waiting. 2016-12-08 19:51:34,614 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:34,614 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:51:39,614 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:39,614 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:51:44,615 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:44,615 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:49,616 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:49,616 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:54,616 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:54,617 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:59,618 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:59,618 [WARNING] emitter.py:90 - Retrying after 5 ...

others are ok.

Contributor

ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM

And in ambari-metrics-collector.log there is above error.

@Zhao Chaofeng

Looks like your AMS is getting down. Can you check if it is happening frequently. Sometimes we have seens that if it has insufficient memory then it is killed and restarted again.

Contributor

I know the reason, because 'timeline.metrics.service.webapp.address' in the ams-site.xml file is not be modified. Why?

Expert Contributor
@Zhao Chaofeng

Can you share ambari metrics collector log & conf?

/var/log/ambari-metrics-collector/ambari-metrics-collector.log

/etc/conf/ambari-metrics-collector/conf?

Contributor

Thank you all, and I have fix the bug in my program. Because I cutom my stack but I do not change the stack_advisor.py that corresponding with the stack.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.