Created 06-06-2017 03:49 AM
Metrics Collector Process
it shows nothing when i type 'netstat -a|grep 6188'
restart ambari-server or reboot dosen't work either
ambari-server log:
MetricsRequestHelper:114 - Error getting timeline metrics : Connection refused
by the way, although my OS time is right but my log time is wrong
Created 06-06-2017 07:21 AM
I see some errors in ambari-metrics-collector logs like,
Unable to connect to HBase store using Phoenix. org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG
Can you check the below link for these errors.
https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html
Created 06-06-2017 03:52 AM
When you are trying to start the AMS collector from ambari UI, Do you see any error on the AMS logs?
If the output of the following command is not returning any value, means the AMS collector is not running, So we need to first check the AMS log for any error / exception. Can you pelase share the error/exceptions observed in the ambari-metrics-collector.log ?
netstat -tanlp |grep 6188
.
Created 06-06-2017 04:05 AM
ambari-metrics-collector.log:
Exception in thread Thread-947: Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 1082, in run self.function(*self.args, **self.kwargs)
File "/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py", line 45, in process_event self.process_host_collection_event(event)
File "/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py", line 79, in process_host_collection_event metrics.update(self.host_info.get_disk_io_counters())
File "/usr/lib/python2.6/site-packages/resource_monitoring/core/host_info.py", line 265, in get_disk_io_counters io_counters = psutil.disk_io_counters()
File "/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py", line 1726, in disk_io_counters raise RuntimeError("couldn't find any physical disk")
RuntimeError: couldn't find any physical disk
thank you very much
Created 06-06-2017 04:31 AM
AMS scripts uses standard Python "psutils" module to find the disk_io_counters. As mentioned in :
File "/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py", line 1726, in disk_io_counters raise RuntimeError("couldn't find any physical disk")
The Error which python is detecting is "couldn't find any physical disk"
So can you please check if there are any disk issues? Do you see any issues with the following commands?
# df -h # du
.
Created 06-06-2017 05:57 AM
sorry i send you the ambari-metrics-collector.out before
# df -h
Filesystem Size Used Avail Use% Mounted on
tank/containers/xdata-0 1.6T 12G 1.6T 1% /
none 492K 4.0K 488K 1% /dev
udev 126G 0 126G 0% /dev/tty
/dev/md0 92G 22G 66G 25% /dev/lxd
none 4.0K 0 4.0K 0% /sys/fs/cgroup none 26G 1.1M 26G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 126G 0 126G 0% /run/shm
none 100M 0 100M 0% /run/user
#du
3750 .
by the way, my another cluster shows that message too,but it works well = =
Created 06-06-2017 03:55 AM
Can you check whether your Ambari-Metrics collector process is running or not. On the same Host where you have installed Metrics collector, Execute the following command,
# netstat -tulpn | grep 6188
If the above command executed with null value, please check the ambari-metrics collector server logs in /var/log/ambari-metrics-collector/ directory. Also try to restart the Ambari Metrics collector and try again.
Created 06-06-2017 05:54 AM
Created 06-06-2017 06:02 AM
Can you attach your Metrics-Collector logs
Created 06-06-2017 06:10 AM
Created 06-06-2017 07:21 AM
I see some errors in ambari-metrics-collector logs like,
Unable to connect to HBase store using Phoenix. org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG
Can you check the below link for these errors.
https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html
Created 06-06-2017 05:17 PM
The ambari-metrics-collector log showing the below message only:
ambari-metrics-collector.log :
2017-06-06 05:56:48,415 WARN org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.query.DefaultPhoenixDataSource: Unable to connect to HBase store using Phoenix. org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG at org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:436) at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:939
So, as asked earlier; did you followed all the steps defined in article [1]. If yes and still facing issue then please attach the hbase-ams-master.log file from location of "/var/log/ambari-metrics-collector/hbase-ams-master-<hostname -f>.log" and also share the '/etc/ambari-metrics-monitor/conf/metric_monitor.ini" file from any host where ambari-metrics-monitor is running.
Also, Can you try to telnet to ambari-metrics-collector from any host using cmd : telnet c2m.xdata.com 6188
[1] https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html
Created 11-09-2017 01:14 PM
Hi,
Today i deployed Hortonworks cluster in AWS cloud. Even i am facing same issue while starting AMS collector. Please find the log files for the same.
------------------------------------
netstat -tulpn | grep 6188
No result. So AMS is down.
-----------------------------------
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 992M 56K 992M 1% /dev
tmpfs 1001M 12K 1001M 1%
/dev/shm /dev/xvda1 30G 4.5G 25G 16% /
-----------------------------------
ambari-metrics-collector.log
2017-11-09 11:52:33,843 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-19-191.ap-south-1.compute.internal/172.31.19.191:2181. Will not attempt to authenticate using SASL (unknown error) 2017-11-09 11:52:33,843 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ip-172-31-19-191.ap-south-1.compute.internal/172.31.19.191:2181, initiating session 2017-11-09 11:52:33,844 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2017-11-09 11:52:35,033 INFO org.apache.helix.manager.zk.ZkClient: Closing zkclient: State:CONNECTING sessionid:0x0 local:null remoteserver:null lastZxid:0 xid:1 sent:24 recv:0 queuedpkts:0 pendingresp:0 queuedevents:0 2017-11-09 11:52:35,033 INFO org.I0Itec.zkclient.ZkEventThread: Terminate ZkClient event thread. 2017-11-09 11:52:35,936 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-21-67.ap-south-1.compute.internal/172.31.21.67:2181. Will not attempt to authenticate using SASL (unknown error) 2017-11-09 11:52:35,936 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ip-172-31-21-67.ap-south-1.compute.internal/172.31.21.67:2181, initiating session 2017-11-09 11:52:36,039 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed 2017-11-09 11:52:36,039 INFO org.apache.helix.manager.zk.ZkClient: Closed zkclient 2017-11-09 11:52:36,039 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000 2017-11-09 11:52:36,040 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore failed in state INITED; cause: org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147) Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:156) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:130) at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:60) at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:69) at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:96) at org.apache.helix.manager.zk.ZKHelixAdmin.<init>(ZKHelixAdmin.java:92) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.availability.MetricCollectorHAController.initializeHAController(MetricCollectorHAController.java:124) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:115) ... 7 more 2017-11-09 11:52:36,041 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x0 2017-11-09 11:52:36,046 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED; cause: org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147) Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000
------------------------------------
Security Group
Added 6188 as custom TCP inbound to my security group also.
------------------------------------
Thanks in advance.
Venkateswara Reddy B