Created 11-24-2015 06:51 PM
Had a disk full issue. After making some space in /var then trying to restart Metric Collector from Ambari, got error:
----------error ---------------
2015-11-24 11:35:54,281 [INFO] controller.py:110 - Adding event to cache, : {u'metrics': [], u'collect_every': u'15'} 2015-11-24 11:35:54,281 [INFO] main.py:65 - Starting Server RPC Thread: /usr/lib/python2.6/site-packages/resource_monitoring/main.py start 2015-11-24 11:35:54,281 [INFO] controller.py:57 - Running Controller thread: Thread-1 2015-11-24 11:35:54,282 [INFO] emitter.py:45 - Running Emitter thread: Thread-2 2015-11-24 11:35:54,282 [INFO] emitter.py:65 - Nothing to emit, resume waiting. 2015-11-24 11:36:54,283 [INFO] emitter.py:91 - server: http://xxxxxxx.com:6188/ws/v1/timeline/metrics 2015-11-24 11:37:44,334 [WARNING] emitter.py:74 - Error sending metrics to server. timed out 2015-11-24 11:37:44,334 [WARNING] emitter.py:80 - Retrying after 5 ...
Created 01-26-2016 05:47 PM
I just had this issue and this is how it was solved.
I added this to ams-hbase-site :: hbase.zookeeper.property.tickTime = 6000 and then restarted AMS
Created 11-24-2015 07:09 PM
netstat -anp | grep 6188
Whats the output of the above command? Can you post more data from the log?
Created 11-24-2015 07:23 PM
Nothing is returned from from netstat -anp |grep 6188 when running on namenode. On Ambari server node (which is also our edge node), the output from the command is:
[ambari@xxxxx~]$ netstat -anp |grep 6188 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) unix 3 [ ] STREAM CONNECTED 215461888 - /var/lib/sss/pipes/private/sbus-dp_centene.com.39654 unix 3 [ ] STREAM CONNECTED 215461883 - /var/lib/sss/pipes/private/sbus-dp_centene.com.39654
Thank you, Neeraj for your quick response.
Created 11-24-2015 07:27 PM
Try to restart the AMS service and run tail -f on AMS logs to check the exact messages while its crashing
Created 11-24-2015 07:32 PM
Make sure the Metrics Collector process is up and running on port 6188 to Neeraj's point. Once the Metrics Collector process is up and running, the Metrics Monitor's should re-connect and start sending metrics.
Created 11-24-2015 08:02 PM
When trying to restart AMS from Ambari UI, I check the AMS log file: ambar-metric-monitor.out on Ambari sever host node: ----------------------------------------- 2015-11-24 13:47:30,987 [WARNING] emitter.py:80 - Retrying after 5 ... 2015-11-24 13:48:35,989 [INFO] emitter.py:91 - server: http://xxxx06t.xxxx.com:6188/ws/v1/timeline/metrics 2015-11-24 13:48:35,990 [WARNING] emitter.py:74 - Error sending metrics to server. <urlopen error [Errno 111] Connection refused> on the another node, which is the primary namenode and also as one of zookeeper nodes, the log file of AMS : ------------------------------------------------------------------------------------------------------------ 2015-11-24 13:43:50,822 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error) 2015-11-24 13:43:50,822 WARN org.apache.zookeeper.ClientCnxn: Session 0x1513af962d30005 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2015-11-24 13:43:51,217 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error) 2015-11-24 13:43:51,217 WARN org.apache.zookeeper.ClientCnxn: Session 0x1513af962d30005 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) The hbase-ams-master-xxx06t.xxxx.com.log has the same error message: -------------------------------------------------------------------------------------------- 2015-11-24 13:30:10,202 WARN [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2015-11-24 13:30:10,206 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
Created 11-24-2015 08:09 PM
@Mike Li Is ZK up ?
Created 11-24-2015 08:13 PM
using ruok command to check, all 3 zookeeper processes are up and running:
./check_zookeeper.ksh imok imok imok
Created 11-24-2015 08:18 PM
Created 11-24-2015 08:38 PM
@Mike Li embeded HBASE instance is down.
Created 11-24-2015 08:46 PM
I discussed this with Paul...try this
Stop the Metrics Collector process using Ambari and make sure all ams related processes are also stopped
ps aux | grep ams
if any are still alive
kill -15 <pid>
then restart the Metrics Collector
Created on 11-25-2015 06:28 PM - edited 08-19-2019 05:45 AM
Following Neeraji suggestions, I stopped AMS first, made user no ams processes running, then restart AMS. Metric Monitors (agents) now all started, and the Metrics Collector is still stuck at 35% .... Saw the following output :
Created 11-25-2015 06:33 PM
@Mike Li let's give some more time and see what happens
Created 11-26-2015 10:51 AM
Created 03-10-2016 06:10 PM
Hi Neeraj im facing this error Metrics Collector - ZooKeeper Server Process :Connection failed: [Errno 111] Connection refused to dcvdevhadnn.eu.scor.local:61181 |
Created 12-08-2015 04:52 AM
Have you resolved this?
If your cluster is kerberized, AMS in distributed mode, and AMS Collector cannot start, then try setting the AMS RS principal and keytab to AMS HBase master principal and keytab.That's a bug we uncovered recently in Ambari-2.1.2.
For other possible issues check this
Created 01-26-2016 05:47 PM
I just had this issue and this is how it was solved.
I added this to ams-hbase-site :: hbase.zookeeper.property.tickTime = 6000 and then restarted AMS
Created 03-10-2016 06:07 PM
How to restart AMS service
Created 08-19-2016 10:25 PM
where and how can I add this parameter? I am having same issues.
thanks