Support Questions

Find answers, ask questions, and share your expertise

Cannot start Ambari-metrics-collector

avatar
Contributor

Hi,

I am having difficulties getting the ambari-metrics-collector to start. I have HBase running in distributed mode.

ambari-metrics-collectorlog.txtI have attached the ambari-metrics-collector.log

I already tried the suggestions from this thread: https://community.hortonworks.com/questions/15818/ambari-metrics-collector-now-starting.html as well as the workaround for issue 6 here https://cwiki.apache.org/confluence/display/AMBARI/Known+Issues

Any tips will be very appreciated.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Angel Kafazov Were you able to verify the AMS keytabs work? Most of the config changes performed above were not needed, example changes to zookeeper and znode settings : For distributed mode only config changes needed are these:

https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_reference_guide/content/_configur...

When you enable security through Ambari the keytabs and principals are generated by Ambari and applied to AMS configs.

Before looking into ambari-metrics-collector.log or ambari-metrics-monitor.out, the ams-hbase daemon should be up and running fine, if not the connection timeouts are of no help since these are expected. Based on the hbase logs posted the HBase daemon tried to login and failed, so we need to figure out why it did fail. Note: If the collector was moved older keytabs would become invalid because hostname changed and would have to be re-generated.

Example of keytab commands:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP1/HDP-1.2.0/bk_installing_manually_book/...

View solution in original post

24 REPLIES 24

avatar
Contributor

avatar
Contributor

Found wrong rootdir hostname, after that I am getting

2016-07-08 12:44:39,320 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server m2.domain/172.16.164.131:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-08 12:44:39,321 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to m2.domain/172.16.164.131:2181, initiating session
2016-07-08 12:44:39,328 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server m2.domain/172.16.164.131:2181, sessionid = 0x255ca408b8d0063, negotiated timeout = 40000
2016-07-08 12:44:50,376 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:45:07,243 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:45:16,166 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:45:32,517 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:45:54,803 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:46:10,720 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:46:37,467 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:47:01,600 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain
2016-07-08 12:47:01,600 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=142264 ms ago, cancelled=false, msg=

avatar
Contributor

Also moving ambari-metrics-collector to another host fails in the wizard with the following error:

stderr: 
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 165, in <module>
    AMSServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 216, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 92, in service_check
    raise Fail("Metrics were not saved. Service check has failed. "
resource_management.core.exceptions.Fail: Metrics were not saved. Service check has failed. 
Connection failed.
 stdout:
2016-07-08 15:41:07,832 - Ambari Metrics service check was started.
2016-07-08 15:41:07,844 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "w1.domain",
      "timestamp": 1467992467000,
      "starttime": 1467992467000,
      "metrics": {
        "1467992467000": 0.113469705131,
        "1467992468000": 1467992467000
      }
    }
  ]
}
2016-07-08 15:41:07,844 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:41:17,856 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:41:17,857 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:41:27,867 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:41:27,867 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:41:37,878 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:41:37,878 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:41:47,891 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:41:47,892 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:41:57,904 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:41:57,905 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:42:07,919 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:42:07,919 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:42:17,929 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:42:17,930 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:42:27,941 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:42:27,942 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
2016-07-08 15:42:37,956 - Connection failed. Next retry in 10 seconds.
2016-07-08 15:42:37,956 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/

avatar
Super Collaborator

@Angel Kafazov Were you able to verify the AMS keytabs work? Most of the config changes performed above were not needed, example changes to zookeeper and znode settings : For distributed mode only config changes needed are these:

https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_reference_guide/content/_configur...

When you enable security through Ambari the keytabs and principals are generated by Ambari and applied to AMS configs.

Before looking into ambari-metrics-collector.log or ambari-metrics-monitor.out, the ams-hbase daemon should be up and running fine, if not the connection timeouts are of no help since these are expected. Based on the hbase logs posted the HBase daemon tried to login and failed, so we need to figure out why it did fail. Note: If the collector was moved older keytabs would become invalid because hostname changed and would have to be re-generated.

Example of keytab commands:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP1/HDP-1.2.0/bk_installing_manually_book/...

avatar
Contributor

Hi @swagle,

Thank you very much for the support. After several retries I managed to delete the service and install it again on another host. It worked, without me doing much else than before, I just had to set the zookeeper.znode.parent to the HBase value. Really don't know why it worked this time.