Created on 09-23-2017 01:10 AM - edited 09-16-2022 05:17 AM
hello:
My CM version is 5.12.0.
There 3 zookeeper server instance on machine 10.0.0.8/13/14. The SERVICE MONITOR lies on 10.0.0.8.
1. A Canary error occurs on CM dashboard of zookeeper ; -- sovled by changing the zookeeper max connections to 1000;
2. The Canary error occurs again after a while; then I find the SERVICE MONITOR take up all the 1000 zookeeper connections.
3. I disabled the Zookeeper Canary test, it will not show the error on CM dashboard.
4. When I run HBase shell on 10.0.0.8, it report an error:
[root@inspire-dev-3 bin]# hbase shell 17/09/23 08:03:41 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 17/09/23 08:03:59 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts 17/09/23 08:03:59 ERROR zookeeper.ZooKeeperWatcher: hconnection-0x3202c09d0x0, quorum=inspire-dev-3:2181,inspire-dev-6:2181,inspire-dev-5:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:919) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:657) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
5. I checked the zookeeper connections by "echo cons | nc 127.0.0.1 2181", it shows 1008 connections.
6. I checked who establised the connections to 2181 by "netstat -npt | grep 2181", it shows the SERVICE MONITOR process have all the connections.
7. I restart the service monitor server, and the hbase shell becomes good.
This post says that it's a bug of SERVICE MONITOR, and has been fixed in 5.3.0, but my version is 5.12.0, and it seems the bug still exists.
So, what should I do now? I must restart SERVICE MONITOR from time to time.
Created 09-25-2017 07:56 AM
Created on 09-25-2017 06:30 PM - edited 09-26-2017 01:49 AM
There are lots of logs like these in SM log:
2017-09-26 01:18:50,715 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x40b71007 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:18:50,733 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x178b2a6f connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:19:50,735 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4a38698e connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:19:50,754 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x60599b95 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:20:50,771 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6f656e8a connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:20:50,792 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x22d3d22c connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:21:50,775 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3b64eef7 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181 2017-09-26 01:21:50,796 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x333cc0f1 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
And there are some WARNINGs :
2017-09-26 00:57:50,306 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xf9c174f connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspir e-dev-3:2181,inspire-dev-5:2181 2017-09-26 00:57:50,325 WARN com.cloudera.cmon.firehose.polling.CdhTask: (14 skipped) Exception in doWork for task: hbase_HBASE_SERVICE_STATE_TASK java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:412) at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:405) at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:283) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:116)
Created 09-26-2017 08:36 PM
Any ideas?