I am trying to start Ambari Metrics through Ambari Web UI. However, the Metrics Collector does not start even after multiple tries. I looked into the log files and the problem seems to be with the zookeeper client connection. The connection is successfully established and a session is initiated, but after the phoenix metrics system has started, no further data can be read from the session. The socket connection is assumed to be closed and after several retries the process is aborted.
Kindly provide a working solution, or let me know if more information is required.
I have shifted to HDP 2.4, and you just need to start Metrics Monitors from Ambari to make everything work. There's probably a problem with HDP 2.5. Please notify if you have a working solution for HDP 2.5.
Is it Embedded or distributed Metrics collector?
Did you trying cleaning up Zookeeper state and restarting, sometimes it might happen due to improper shutdown in embedded mode the state gets corrupted.
Please find the value of "hbase.tmp.dir", in the AMS configs (default = /var/lib/ambari-metrics-collector/hbase-tmp/) then try the following
rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/ OR mv /var/lib/ambari-metrics-collector/hbase-tmp /Backup_Dir
- Also try remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper'
and remove any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder
- The try restarting AMS.
- Still if the issue persist then can you please share the complete stack trace of the error
Reference: "Cleaning up Ambari Metrics System Data"
Mine is an embedded Metrics Collector. I have located the folders, but somehow I am unable to remove the files inside this folder. It says 'invalid argument' for every folder. Please suggest a workaround, or if there's a problem with the way I did it. I am adding a screenshot of the commands I issued inside the hbase.tmp.dir.
ok, nevermind. "lsattr" command lists attributes of file, and some attribute can be used to block file deletion, but when it is set, the output is different, so thats not the reason.
Instead of deleting the whole folder, delete only the content of /hbase-tmp/zookeeper/zookeeper_0/version-2/* and restart Metrics Collector
@Edgar Daeds, followed the instructions given above. The Metrics Collector showed "Started" as the status after the restart. However, upon refreshing the page, the status came back to "Stopped". Do you think this is some sort of an issue in HDP 2.5 ? I started Metrics Collector in HDP 2.4 and it seems to work fine over there.
I am having the same issue. When I restart Metrics Collector sometimes it goes down after a while. Deleting the content of hbase-tmp/.../version-2/* helps. I am using HDP 2.5.
Could you please share the collector log?
The log-file: ambari-metrics-collector.zip
When you say "for a while", does it mean that it comes back up normally after you restart it ? I tried deleting the contents of version-2/* multiple times (the directory was empty anyway).
Thanks. I said after a while and I meant that after 30secs-1min Ambari Collector goes down.
Ok then, if you cant remove the whole folder, did you try to move it to another place? i.e. /tmp.
What operating system are you using?
Try to move/delete also the location of "hbase.rootdir" (Ambari Metrics -> Config -> Advanced ams-hbase-site)