Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

cant start ambari collector

avatar

we cant start the ambari collector

from the logs we got the following:

what chould be the problem ?


2018-03-11 15:50:20,249 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection-0x4bdeaabb-0x16215bf27850000, quorum=master02.sys673.com:61181, baseZNode=/ams-hbase-unsecure Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:491) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:172) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:611) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:592) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:565) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1195) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1136) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:298) at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:151) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:376) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
grep error  ambari-metrics-collector.log
2018-03-11 14:59:04,182 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:04,185 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:04,458 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:04,459 WARN org.apache.zookeeper.ClientCnxn: Session 0x162158dcd9a0001 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:05,286 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:05,286 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:05,812 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:05,813 WARN org.apache.zookeeper.ClientCnxn: Session 0x162158dcd9a0001 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:06,388 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:06,388 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:07,197 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:07,197 WARN org.apache.zookeeper.ClientCnxn: Session 0x162158dcd9a0001 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:07,489 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:07,490 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2018-03-11 14:59:08,591 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys673.com/130.14.52.8:61181. Will not attempt to authenticate using SASL (unknown error)
2018-03-11 14:59:08,592 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Looks like AMS collector is not able to start properly and hence the zookeeper is showing connection Loss while finding the znode.

Sometimes it happens if the AMS collector is not tuned properly. Specially when the Heap Settings are not done properly according to the number of nodes present in the cluster. So can you please refer to the following doc to check if the Heap settings are fine and according to the cluster:

- https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

- Sometimes cleaning up the "hbase.zookeeper.property.dataDir" and '${hbase.tmp.dir}/phoenix-spool temp directory content helps in making the zookeeper and spool dir data (which are temporary dirs).

View solution in original post

15 REPLIES 15

avatar

@Jay you mention to "Increased heap memory for AMS collector and hbase." , please let me know what are the variable that we need to increase ? ( they from ambari GUI ) ?

Michael-Bronson

avatar

@Jay , thank you so much the "Suggested Memory settings" new values solved the problem , and now metrics collector is up , and thank you for the time that you put on this case , we are very appreciate this

Michael-Bronson

avatar

ambari-metrics-collector.log
java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-03-12 08:54:14,696 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys3423.com/43.25.76.98:61181. Will not attempt to authenticate using SASL (unknown error) 2018-03-12 08:54:14,697 WARN org.apache.zookeeper.ClientCnxn: Session 0x1621966a8020001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-03-12 08:54:14,797 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=master02.sys3423.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server 2018-03-12 08:54:15,859 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys3423.com/43.25.76.98:61181. Will not attempt to authenticate using SASL (unknown error) 2018-03-12 08:54:15,860 WARN org.apache.zookeeper.ClientCnxn: Session 0x1621966a8020001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-03-12 08:54:17,363 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys3423.com/43.25.76.98:61181. Will not attempt to authenticate using SASL (unknown error) 2018-03-12 08:54:17,364 WARN org.apache.zookeeper.ClientCnxn: Session 0x1621966a8020001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-03-12 08:54:19,037 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys3423.com/43.25.76.98:61181. Will not attempt to authenticate using SASL (unknown error) 2018-03-12 08:54:19,037 WARN org.apache.zookeeper.ClientCnxn: Session 0x1621966a8020001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

The latest error indicates that AMS collector is still going down. But after running for some time. Means it requires further tuning.

2018-03-12 08:54:19,037 WARN org.apache.zookeeper.ClientCnxn: Session 0x1621966a8020001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

.

So can you please restart the AMS collector and then check the heap usage after some time to see if it is reaching the maximum limit?

# $JAVA_HOME/bin/jmap --heap $PID_AMS

avatar

@Jay after we fine-tune the parameters (yesterday), now every thing is ok and metrics collector is up , in spite all this do you want to restart the AMS collector? anyway?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

No if the AMS is running fine for now then we do not need to restart it. We can keep monitoring it for some time to see if everything is going fine.