Support Questions

Find answers, ask questions, and share your expertise

AMS:Connection failed: [Errno 111] Connection refused to 0.0.0.0:6188

Explorer

Metrics Collector Process

it shows nothing when i type 'netstat -a|grep 6188'

restart ambari-server or reboot dosen't work either

ambari-server log:

MetricsRequestHelper:114 - Error getting timeline metrics : Connection refused

by the way, although my OS time is right but my log time is wrong

1 ACCEPTED SOLUTION

@white wartih

I see some errors in ambari-metrics-collector logs like,

Unable to connect to HBase store using Phoenix.
org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG

Can you check the below link for these errors.

https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html

View solution in original post

11 REPLIES 11

Super Mentor

@white wartih

When you are trying to start the AMS collector from ambari UI, Do you see any error on the AMS logs?

If the output of the following command is not returning any value, means the AMS collector is not running, So we need to first check the AMS log for any error / exception. Can you pelase share the error/exceptions observed in the ambari-metrics-collector.log ?

netstat -tanlp |grep 6188

.

Explorer

@Jay SenSharma

@Jay SenSharma

ambari-metrics-collector.log:

Exception in thread Thread-947: Traceback (most recent call last):

File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 1082, in run self.function(*self.args, **self.kwargs)

File "/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py", line 45, in process_event self.process_host_collection_event(event)

File "/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py", line 79, in process_host_collection_event metrics.update(self.host_info.get_disk_io_counters())

File "/usr/lib/python2.6/site-packages/resource_monitoring/core/host_info.py", line 265, in get_disk_io_counters io_counters = psutil.disk_io_counters()

File "/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py", line 1726, in disk_io_counters raise RuntimeError("couldn't find any physical disk")

RuntimeError: couldn't find any physical disk

thank you very much

Super Mentor

@white wartih

AMS scripts uses standard Python "psutils" module to find the disk_io_counters. As mentioned in :

https://github.com/apache/ambari/blob/release-2.5.0/ambari-metrics/ambari-metrics-host-monitoring/sr...

File "/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py", line 1726, in disk_io_counters raise RuntimeError("couldn't find any physical disk") 

The Error which python is detecting is "couldn't find any physical disk"

So can you please check if there are any disk issues? Do you see any issues with the following commands?

# df -h
# du

.

Explorer

@Jay SenSharma

sorry i send you the ambari-metrics-collector.out before

ambari-metrics-collector.txt

# df -h

Filesystem Size Used Avail Use% Mounted on

tank/containers/xdata-0 1.6T 12G 1.6T 1% /

none 492K 4.0K 488K 1% /dev

udev 126G 0 126G 0% /dev/tty

/dev/md0 92G 22G 66G 25% /dev/lxd

none 4.0K 0 4.0K 0% /sys/fs/cgroup none 26G 1.1M 26G 1% /run

none 5.0M 0 5.0M 0% /run/lock

none 126G 0 126G 0% /run/shm

none 100M 0 100M 0% /run/user

#du

3750 .

by the way, my another cluster shows that message too,but it works well = =

@white wartih

Can you check whether your Ambari-Metrics collector process is running or not. On the same Host where you have installed Metrics collector, Execute the following command,

# netstat -tulpn | grep 6188

If the above command executed with null value, please check the ambari-metrics collector server logs in /var/log/ambari-metrics-collector/ directory. Also try to restart the Ambari Metrics collector and try again.

Explorer

@nshelke

thank a lot!

please check the message above

@white wartih

Can you attach your Metrics-Collector logs

Explorer

@white wartih

I see some errors in ambari-metrics-collector logs like,

Unable to connect to HBase store using Phoenix.
org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG

Can you check the below link for these errors.

https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html

Explorer

@white wartih

The ambari-metrics-collector log showing the below message only:

ambari-metrics-collector.log :

2017-06-06 05:56:48,415 WARN org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.query.DefaultPhoenixDataSource: Unable to connect to HBase store using Phoenix.
org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=SYSTEM.CATALOG
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:436)
	at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:939

So, as asked earlier; did you followed all the steps defined in article [1]. If yes and still facing issue then please attach the hbase-ams-master.log file from location of "/var/log/ambari-metrics-collector/hbase-ams-master-<hostname -f>.log" and also share the '/etc/ambari-metrics-monitor/conf/metric_monitor.ini" file from any host where ambari-metrics-monitor is running.

Also, Can you try to telnet to ambari-metrics-collector from any host using cmd : telnet c2m.xdata.com 6188

[1] https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html

New Contributor

Hi,

Today i deployed Hortonworks cluster in AWS cloud. Even i am facing same issue while starting AMS collector. Please find the log files for the same.

------------------------------------

netstat -tulpn | grep 6188

No result. So AMS is down.

-----------------------------------

df -h

Filesystem Size Used Avail Use% Mounted on

devtmpfs 992M 56K 992M 1% /dev

tmpfs 1001M 12K 1001M 1%

/dev/shm /dev/xvda1 30G 4.5G 25G 16% /

-----------------------------------

ambari-metrics-collector.log

2017-11-09 11:52:33,843 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-19-191.ap-south-1.compute.internal/172.31.19.191:2181. Will not attempt to authenticate using SASL (unknown error)
2017-11-09 11:52:33,843 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ip-172-31-19-191.ap-south-1.compute.internal/172.31.19.191:2181, initiating session
2017-11-09 11:52:33,844 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2017-11-09 11:52:35,033 INFO org.apache.helix.manager.zk.ZkClient: Closing zkclient: State:CONNECTING sessionid:0x0 local:null remoteserver:null lastZxid:0 xid:1 sent:24 recv:0 queuedpkts:0 pendingresp:0 queuedevents:0
2017-11-09 11:52:35,033 INFO org.I0Itec.zkclient.ZkEventThread: Terminate ZkClient event thread.
2017-11-09 11:52:35,936 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-21-67.ap-south-1.compute.internal/172.31.21.67:2181. Will not attempt to authenticate using SASL (unknown error)
2017-11-09 11:52:35,936 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ip-172-31-21-67.ap-south-1.compute.internal/172.31.21.67:2181, initiating session
2017-11-09 11:52:36,039 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed
2017-11-09 11:52:36,039 INFO org.apache.helix.manager.zk.ZkClient: Closed zkclient
2017-11-09 11:52:36,039 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000
2017-11-09 11:52:36,040 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore failed in state INITED; cause: org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000
        at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232)
        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:156)
        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:130)
        at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:60)
        at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:69)
        at org.apache.helix.manager.zk.ZkClient.<init>(ZkClient.java:96)
        at org.apache.helix.manager.zk.ZKHelixAdmin.<init>(ZKHelixAdmin.java:92)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.availability.MetricCollectorHAController.initializeHAController(MetricCollectorHAController.java:124)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:115)
        ... 7 more
2017-11-09 11:52:36,041 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x0
2017-11-09 11:52:36,046 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED; cause: org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000



	

------------------------------------

Security Group

Added 6188 as custom TCP inbound to my security group also.

------------------------------------

Thanks in advance.

Venkateswara Reddy B