Created 11-24-2015 06:51 PM
Had a disk full issue. After making some space in /var then trying to restart Metric Collector from Ambari, got error:
----------error ---------------
2015-11-24 11:35:54,281 [INFO] controller.py:110 - Adding event to cache, : {u'metrics': [], u'collect_every': u'15'} 2015-11-24 11:35:54,281 [INFO] main.py:65 - Starting Server RPC Thread: /usr/lib/python2.6/site-packages/resource_monitoring/main.py start 2015-11-24 11:35:54,281 [INFO] controller.py:57 - Running Controller thread: Thread-1 2015-11-24 11:35:54,282 [INFO] emitter.py:45 - Running Emitter thread: Thread-2 2015-11-24 11:35:54,282 [INFO] emitter.py:65 - Nothing to emit, resume waiting. 2015-11-24 11:36:54,283 [INFO] emitter.py:91 - server: http://xxxxxxx.com:6188/ws/v1/timeline/metrics 2015-11-24 11:37:44,334 [WARNING] emitter.py:74 - Error sending metrics to server. timed out 2015-11-24 11:37:44,334 [WARNING] emitter.py:80 - Retrying after 5 ...
Created 01-26-2016 05:47 PM
I just had this issue and this is how it was solved.
I added this to ams-hbase-site :: hbase.zookeeper.property.tickTime = 6000 and then restarted AMS
Created 11-24-2015 07:09 PM
netstat -anp | grep 6188
Whats the output of the above command? Can you post more data from the log?
Created 11-24-2015 07:23 PM
Nothing is returned from from netstat -anp |grep 6188 when running on namenode. On Ambari server node (which is also our edge node), the output from the command is:
[ambari@xxxxx~]$ netstat -anp |grep 6188 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) unix 3 [ ] STREAM CONNECTED 215461888 - /var/lib/sss/pipes/private/sbus-dp_centene.com.39654 unix 3 [ ] STREAM CONNECTED 215461883 - /var/lib/sss/pipes/private/sbus-dp_centene.com.39654
Thank you, Neeraj for your quick response.
Created 11-24-2015 07:27 PM
Try to restart the AMS service and run tail -f on AMS logs to check the exact messages while its crashing
Created 11-24-2015 07:32 PM
Make sure the Metrics Collector process is up and running on port 6188 to Neeraj's point. Once the Metrics Collector process is up and running, the Metrics Monitor's should re-connect and start sending metrics.
Created 11-24-2015 08:02 PM
When trying to restart AMS from Ambari UI, I check the AMS log file: ambar-metric-monitor.out on Ambari sever host node: ----------------------------------------- 2015-11-24 13:47:30,987 [WARNING] emitter.py:80 - Retrying after 5 ... 2015-11-24 13:48:35,989 [INFO] emitter.py:91 - server: http://xxxx06t.xxxx.com:6188/ws/v1/timeline/metrics 2015-11-24 13:48:35,990 [WARNING] emitter.py:74 - Error sending metrics to server. <urlopen error [Errno 111] Connection refused> on the another node, which is the primary namenode and also as one of zookeeper nodes, the log file of AMS : ------------------------------------------------------------------------------------------------------------ 2015-11-24 13:43:50,822 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error) 2015-11-24 13:43:50,822 WARN org.apache.zookeeper.ClientCnxn: Session 0x1513af962d30005 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2015-11-24 13:43:51,217 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error) 2015-11-24 13:43:51,217 WARN org.apache.zookeeper.ClientCnxn: Session 0x1513af962d30005 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) The hbase-ams-master-xxx06t.xxxx.com.log has the same error message: -------------------------------------------------------------------------------------------- 2015-11-24 13:30:10,202 WARN [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2015-11-24 13:30:10,206 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
Created 11-24-2015 08:09 PM
@Mike Li Is ZK up ?
Created 11-24-2015 08:13 PM
using ruok command to check, all 3 zookeeper processes are up and running:
./check_zookeeper.ksh imok imok imok
Created 11-24-2015 08:18 PM
Created 11-24-2015 08:38 PM
@Mike Li embeded HBASE instance is down.