Created 07-06-2016 03:33 PM
Hi,
I am having difficulties getting the ambari-metrics-collector to start. I have HBase running in distributed mode.
ambari-metrics-collectorlog.txtI have attached the ambari-metrics-collector.log
I already tried the suggestions from this thread: https://community.hortonworks.com/questions/15818/ambari-metrics-collector-now-starting.html as well as the workaround for issue 6 here https://cwiki.apache.org/confluence/display/AMBARI/Known+Issues
Any tips will be very appreciated.
Created 07-09-2016 06:28 PM
@Angel Kafazov Were you able to verify the AMS keytabs work? Most of the config changes performed above were not needed, example changes to zookeeper and znode settings : For distributed mode only config changes needed are these:
When you enable security through Ambari the keytabs and principals are generated by Ambari and applied to AMS configs.
Before looking into ambari-metrics-collector.log or ambari-metrics-monitor.out, the ams-hbase daemon should be up and running fine, if not the connection timeouts are of no help since these are expected. Based on the hbase logs posted the HBase daemon tried to login and failed, so we need to figure out why it did fail. Note: If the collector was moved older keytabs would become invalid because hostname changed and would have to be re-generated.
Example of keytab commands:
Created 07-08-2016 12:06 PM
Created 07-08-2016 12:51 PM
Found wrong rootdir hostname, after that I am getting
2016-07-08 12:44:39,320 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server m2.domain/172.16.164.131:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-08 12:44:39,321 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to m2.domain/172.16.164.131:2181, initiating session 2016-07-08 12:44:39,328 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server m2.domain/172.16.164.131:2181, sessionid = 0x255ca408b8d0063, negotiated timeout = 40000 2016-07-08 12:44:50,376 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:45:07,243 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:45:16,166 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:45:32,517 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:45:54,803 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:46:10,720 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:46:37,467 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:47:01,600 WARN org.apache.hadoop.hbase.ipc.AbstractRpcClient: Couldn't setup connection for amshbase/m2.domain@domain to amshbasemaster/m1.domain@domain 2016-07-08 12:47:01,600 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=142264 ms ago, cancelled=false, msg=
Created 07-08-2016 03:42 PM
Also moving ambari-metrics-collector to another host fails in the wizard with the following error:
stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 165, in <module> AMSServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 216, in execute method(env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 92, in service_check raise Fail("Metrics were not saved. Service check has failed. " resource_management.core.exceptions.Fail: Metrics were not saved. Service check has failed. Connection failed. stdout: 2016-07-08 15:41:07,832 - Ambari Metrics service check was started. 2016-07-08 15:41:07,844 - Generated metrics: { "metrics": [ { "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "w1.domain", "timestamp": 1467992467000, "starttime": 1467992467000, "metrics": { "1467992467000": 0.113469705131, "1467992468000": 1467992467000 } } ] } 2016-07-08 15:41:07,844 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:41:17,856 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:41:17,857 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:41:27,867 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:41:27,867 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:41:37,878 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:41:37,878 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:41:47,891 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:41:47,892 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:41:57,904 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:41:57,905 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:42:07,919 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:42:07,919 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:42:17,929 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:42:17,930 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:42:27,941 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:42:27,942 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/ 2016-07-08 15:42:37,956 - Connection failed. Next retry in 10 seconds. 2016-07-08 15:42:37,956 - Connecting (POST) to w3.domain:6188/ws/v1/timeline/metrics/
Created 07-09-2016 06:28 PM
@Angel Kafazov Were you able to verify the AMS keytabs work? Most of the config changes performed above were not needed, example changes to zookeeper and znode settings : For distributed mode only config changes needed are these:
When you enable security through Ambari the keytabs and principals are generated by Ambari and applied to AMS configs.
Before looking into ambari-metrics-collector.log or ambari-metrics-monitor.out, the ams-hbase daemon should be up and running fine, if not the connection timeouts are of no help since these are expected. Based on the hbase logs posted the HBase daemon tried to login and failed, so we need to figure out why it did fail. Note: If the collector was moved older keytabs would become invalid because hostname changed and would have to be re-generated.
Example of keytab commands:
Created 07-11-2016 02:40 PM
Hi @swagle,
Thank you very much for the support. After several retries I managed to delete the service and install it again on another host. It worked, without me doing much else than before, I just had to set the zookeeper.znode.parent to the HBase value. Really don't know why it worked this time.