Created 07-06-2016 03:33 PM
Hi,
I am having difficulties getting the ambari-metrics-collector to start. I have HBase running in distributed mode.
ambari-metrics-collectorlog.txtI have attached the ambari-metrics-collector.log
I already tried the suggestions from this thread: https://community.hortonworks.com/questions/15818/ambari-metrics-collector-now-starting.html as well as the workaround for issue 6 here https://cwiki.apache.org/confluence/display/AMBARI/Known+Issues
Any tips will be very appreciated.
Created 07-09-2016 06:28 PM
@Angel Kafazov Were you able to verify the AMS keytabs work? Most of the config changes performed above were not needed, example changes to zookeeper and znode settings : For distributed mode only config changes needed are these:
When you enable security through Ambari the keytabs and principals are generated by Ambari and applied to AMS configs.
Before looking into ambari-metrics-collector.log or ambari-metrics-monitor.out, the ams-hbase daemon should be up and running fine, if not the connection timeouts are of no help since these are expected. Based on the hbase logs posted the HBase daemon tried to login and failed, so we need to figure out why it did fail. Note: If the collector was moved older keytabs would become invalid because hostname changed and would have to be re-generated.
Example of keytab commands:
Created 07-06-2016 04:08 PM
Which version of Ambari are you using?
Created 07-06-2016 05:11 PM
Hi Orivier,
It is Version2.1.2.1
Created 07-06-2016 05:22 PM
2016-07-06 14:59:14,466 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=m2.domain:2181 sessionTimeout=120000 watcher=hconnection- 0x7bc9e6ab0x0, quorum=m2.domain:2181, baseZNode=/hbase-secure
Looks like AMS tried to connect to hbase cluster's znode.
AMS should use /ams-hbase-secure as base znode.
Can you check your configuration ?
Created 07-06-2016 05:56 PM
Hi, I changed it, but now I am getting
2016-07-06 17:55:20,187 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /ams-hbase-secure is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
Created 07-06-2016 06:20 PM
Hi Angel,
From your error, it looks like AMS is talking to cluster zookeeper (port 2181) . AMS in Version 2.1.2.1 uses it's own zookeeper in all modes of operation (port 61181).
Can you share your hbase-site.xml in /etc/ams-hbase/conf ? That will help us figure out the issue.
Thanks!
Created 07-09-2016 10:29 AM
Hi, I changed the port to 61181 by it is able to connect. I see no service running on port 61181. The following messages in the log:
2016-07-06 18:37:42,385 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server m2.tmaut.tlabsdata.com/172.16.164.131:61181. Will not attempt to authenticate using SASL (unknown error) 2016-07-06 18:37:42,386 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server m2.tmaut.tlabsdata.com/172.16.164.131:61181. Will not attempt to authenticate using SASL (unknown error) 2016-07-06 18:37:42,386 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2016-07-06 18:37:42,387 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
also attached the hbase-site.xml
Created 07-06-2016 06:21 PM
Please revert back the znode setting to default, if cluster is not kerberized:
/ams-hbase-unsecure
Also, make sure the quorum value in ams-hbase-site is:
hbase.zookeeper.quorum
{{zookeeper_quorum_hosts}}
Created 07-06-2016 06:41 PM
Hi swagle,
the cluster is kerberized. hbase.zookeeper.quorum looks ok
Created 07-06-2016 09:37 PM
See the attached doc should help.