Support Questions

Find answers, ask questions, and share your expertise

Cannot start Ambari-metrics-collector

avatar
Contributor

Hi,

I am having difficulties getting the ambari-metrics-collector to start. I have HBase running in distributed mode.

ambari-metrics-collectorlog.txtI have attached the ambari-metrics-collector.log

I already tried the suggestions from this thread: https://community.hortonworks.com/questions/15818/ambari-metrics-collector-now-starting.html as well as the workaround for issue 6 here https://cwiki.apache.org/confluence/display/AMBARI/Known+Issues

Any tips will be very appreciated.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Angel Kafazov Were you able to verify the AMS keytabs work? Most of the config changes performed above were not needed, example changes to zookeeper and znode settings : For distributed mode only config changes needed are these:

https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_reference_guide/content/_configur...

When you enable security through Ambari the keytabs and principals are generated by Ambari and applied to AMS configs.

Before looking into ambari-metrics-collector.log or ambari-metrics-monitor.out, the ams-hbase daemon should be up and running fine, if not the connection timeouts are of no help since these are expected. Based on the hbase logs posted the HBase daemon tried to login and failed, so we need to figure out why it did fail. Note: If the collector was moved older keytabs would become invalid because hostname changed and would have to be re-generated.

Example of keytab commands:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP1/HDP-1.2.0/bk_installing_manually_book/...

View solution in original post

24 REPLIES 24

avatar

Which version of Ambari are you using?

avatar
Contributor

Hi Orivier,

It is Version2.1.2.1

avatar
Master Collaborator

2016-07-06 14:59:14,466 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=m2.domain:2181 sessionTimeout=120000 watcher=hconnection- 0x7bc9e6ab0x0, quorum=m2.domain:2181, baseZNode=/hbase-secure

Looks like AMS tried to connect to hbase cluster's znode.

AMS should use /ams-hbase-secure as base znode.

Can you check your configuration ?

avatar
Contributor

Hi, I changed it, but now I am getting

2016-07-06 17:55:20,187 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /ams-hbase-secure is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.

avatar
Super Collaborator

Hi Angel,

From your error, it looks like AMS is talking to cluster zookeeper (port 2181) . AMS in Version 2.1.2.1 uses it's own zookeeper in all modes of operation (port 61181).

Can you share your hbase-site.xml in /etc/ams-hbase/conf ? That will help us figure out the issue.

Thanks!

avatar
Contributor

hbase-site.xml

Hi, I changed the port to 61181 by it is able to connect. I see no service running on port 61181. The following messages in the log:

2016-07-06 18:37:42,385 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server m2.tmaut.tlabsdata.com/172.16.164.131:61181. Will not attempt to authenticate using SASL (unknown error) 2016-07-06 18:37:42,386 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server m2.tmaut.tlabsdata.com/172.16.164.131:61181. Will not attempt to authenticate using SASL (unknown error) 2016-07-06 18:37:42,386 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2016-07-06 18:37:42,387 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

also attached the hbase-site.xml

avatar
Super Collaborator

Please revert back the znode setting to default, if cluster is not kerberized:

/ams-hbase-unsecure

Also, make sure the quorum value in ams-hbase-site is:

hbase.zookeeper.quorum

{{zookeeper_quorum_hosts}}

avatar
Contributor

Hi swagle,

the cluster is kerberized. hbase.zookeeper.quorum looks ok

avatar
Master Mentor

@Angel Kafazov

See the attached doc should help.