Created 06-06-2017 03:49 AM
Metrics Collector Process
it shows nothing when i type 'netstat -a|grep 6188'
restart ambari-server or reboot dosen't work either
ambari-server log:
MetricsRequestHelper:114 - Error getting timeline metrics : Connection refused
by the way, although my OS time is right but my log time is wrong
Created 06-06-2017 07:21 AM
I see some errors in ambari-metrics-collector logs like,
Unable to connect to HBase store using Phoenix. org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG
Can you check the below link for these errors.
https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html
Created 06-06-2017 05:17 PM
The ambari-metrics-collector log showing the below message only:
ambari-metrics-collector.log :
2017-06-06 05:56:48,415 WARN org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.query.DefaultPhoenixDataSource: Unable to connect to HBase store using Phoenix. org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG at org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:436) at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:939
So, as asked earlier; did you followed all the steps defined in article [1]. If yes and still facing issue then please attach the hbase-ams-master.log file from location of "/var/log/ambari-metrics-collector/hbase-ams-master-<hostname -f>.log" and also share the '/etc/ambari-metrics-monitor/conf/metric_monitor.ini" file from any host where ambari-metrics-monitor is running.
Also, Can you try to telnet to ambari-metrics-collector from any host using cmd : telnet c2m.xdata.com 6188
[1] https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html
Created 11-09-2017 01:14 PM
Hi,
Today i deployed Hortonworks cluster in AWS cloud. Even i am facing same issue while starting AMS collector. Please find the log files for the same.
------------------------------------
netstat -tulpn | grep 6188
No result. So AMS is down.
-----------------------------------
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 992M 56K 992M 1% /dev
tmpfs 1001M 12K 1001M 1%
/dev/shm /dev/xvda1 30G 4.5G 25G 16% /
-----------------------------------
ambari-metrics-collector.log
2017-11-09 11:52:33,843 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-19-191.ap-south-1.compute.internal/172.31.19.191:2181. Will not attempt to authenticate using SASL (unknown error) 2017-11-09 11:52:33,843 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ip-172-31-19-191.ap-south-1.compute.internal/172.31.19.191:2181, initiating session 2017-11-09 11:52:33,844 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2017-11-09 11:52:35,033 INFO org.apache.helix.manager.zk.ZkClient: Closing zkclient: State:CONNECTING sessionid:0x0 local:null remoteserver:null lastZxid:0 xid:1 sent:24 recv:0 queuedpkts:0 pendingresp:0 queuedevents:0 2017-11-09 11:52:35,033 INFO org.I0Itec.zkclient.ZkEventThread: Terminate ZkClient event thread. 2017-11-09 11:52:35,936 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-21-67.ap-south-1.compute.internal/172.31.21.67:2181. Will not attempt to authenticate using SASL (unknown error) 2017-11-09 11:52:35,936 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ip-172-31-21-67.ap-south-1.compute.internal/172.31.21.67:2181, initiating session 2017-11-09 11:52:36,039 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed 2017-11-09 11:52:36,039 INFO org.apache.helix.manager.zk.ZkClient: Closed zkclient 2017-11-09 11:52:36,039 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000 2017-11-09 11:52:36,040 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore failed in state INITED; cause: org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147) Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:156) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:130) at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:60) at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:69) at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:96) at org.apache.helix.manager.zk.ZKHelixAdmin.<init>(ZKHelixAdmin.java:92) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.availability.MetricCollectorHAController.initializeHAController(MetricCollectorHAController.java:124) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:115) ... 7 more 2017-11-09 11:52:36,041 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x0 2017-11-09 11:52:36,046 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED; cause: org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118) at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147) Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000
------------------------------------
Security Group
Added 6188 as custom TCP inbound to my security group also.
------------------------------------
Thanks in advance.
Venkateswara Reddy B