Created 12-08-2016 12:25 PM
2016-12-08 19:30:35,118 - Using hadoop conf dir: /usr/isdp/current/hadoop-client/conf 2016-12-08 19:30:35,120 - checked_call['hostid'] {} 2016-12-08 19:30:35,184 - checked_call returned (0, 'a8c0366f') 2016-12-08 19:30:35,184 - Ambari Metrics service check was started. 2016-12-08 19:30:35,188 - Generated metrics: { "metrics": [ { "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "bigdata002.istuary.com", "timestamp": 1481196635000, "starttime": 1481196635000, "metrics": { "1481196635000": 0.380131946063, "1481196636000": 1481196635000 } } ] } 2016-12-08 19:30:35,188 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/ 2016-12-08 19:30:50,232 - Connection failed. Next retry in 15 seconds. 2016-12-08 19:30:50,234 - Generated metrics: { "metrics": [ { "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "bigdata002.istuary.com", "timestamp": 1481196650000, "starttime": 1481196650000, "metrics": { "1481196650000": 0.380131946063, "1481196651000": 1481196650000 } } ] } 2016-12-08 19:30:50,234 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/ 2016-12-08 19:31:05,250 - Connection failed. Next retry in 15 seconds. 2016-12-08 19:31:05,251 - Generated metrics: { "metrics": [ { "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "bigdata002.istuary.com", "timestamp": 1481196665000, "starttime": 1481196665000, "metrics": { "1481196665000": 0.380131946063, "1481196666000": 1481196665000 } } ] } 2016-12-08 19:31:05,251 - Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/ 2016-12-08 19:31:20,266 - Connection failed. Next retry in 15 seconds. Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 184, in <module> AMSServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 102, in service_check raise Fail("Metrics were not saved. Service check has failed. " resource_management.core.exceptions.Fail: Metrics were not saved. Service check has failed. Connection failed.
Created 12-10-2016 03:00 AM
Thank you all, and I have fix the bug in my program. Because I cutom my stack but I do not change the stack_advisor.py that corresponding with the stack.
Created 12-08-2016 12:26 PM
my ambari version is 2.4.1, but I can not find where comes wrong.
Created 12-08-2016 12:29 PM
What is the output of the following command. Is that package shows same version in all the hosts.
rpm -qa | ambari-metrics
.
Created 12-08-2016 12:45 PM
As per the code: https://github.com/apache/ambari/blob/release-2.4.1/ambari-server/src/main/resources/common-services...
Ambari will retry couple of times (AMS_CONNECT_TRIES=30) before showing that error. So can you pelase check that there is no connectivity issue and you are able to access it :
Connecting (POST) to bigdata002.istuary.com:6188/ws/v1/timeline/metrics/
The service check script runs from one of the healthy node present in the cluster (which might not be ambari server always). So from that host where the service_check ran last time can you open the mentioned URL. It should be accessible "bigdata002.istuary.com:6188".
Created 12-08-2016 01:04 PM
Yes, you are right, but I deploy ambari on one host, so I can not find why there is some wrong with the network.
Created 12-08-2016 01:13 PM
Is the following port opened on that host ? Or do we see any error in the AMS logs?
netstat -tnlpa | grep 6188
.
Created 12-08-2016 01:17 PM
6188 port is ok, AMS logs has some error, like this in /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out:
2016-12-08 19:50:34,611 [INFO] controller.py:56 - Running Controller thread: Thread-1 2016-12-08 19:50:34,611 [INFO] emitter.py:55 - Running Emitter thread: Thread-2 2016-12-08 19:50:34,611 [INFO] emitter.py:75 - Nothing to emit, resume waiting. 2016-12-08 19:51:34,614 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:34,614 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:51:39,614 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:39,614 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:51:44,615 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:51:44,615 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:49,616 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:49,616 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:54,616 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:54,617 [WARNING] emitter.py:90 - Retrying after 5 ... 2016-12-08 19:52:59,618 [WARNING] emitter.py:84 - Error sending metrics to server. [Errno 111] Connection refused 2016-12-08 19:52:59,618 [WARNING] emitter.py:90 - Retrying after 5 ...
others are ok.
Created 12-08-2016 01:18 PM
ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM
And in ambari-metrics-collector.log there is above error.
Created 12-08-2016 02:03 PM
Looks like your AMS is getting down. Can you check if it is happening frequently. Sometimes we have seens that if it has insufficient memory then it is killed and restarted again.
Created 12-09-2016 01:14 AM
I know the reason, because 'timeline.metrics.service.webapp.address' in the ams-site.xml file is not be modified. Why?