Support Questions

Find answers, ask questions, and share your expertise

Metrics System unable to initialize HA controller

Explorer

I clean up all the data about AMS from my cluster. Include the node on the zookeeper(ams-hbase-secure) except /ambari-metrics-cluster. Because after rmr the /ambari-metrics-cluster, it will be created again and I have no any idea about the creater.

Then, I reinstall the AMS and Metric Collector start failed.

I got this ERROR:

2019-01-09 15:04:20,257 INFO org.apache.helix.manager.zk.ZKHelixAdmin: Cluster ambari-metrics-cluster already exists
2019-01-09 15:04:40,287 WARN org.apache.helix.manager.zk.ZKHelixAdmin: Root directory exists.Cleaning the root directory:/ambari-metrics-cluster
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
        at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:791)
        at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:786)
        at org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:497)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:115)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
        at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:104)
        at org.apache.helix.manager.zk.ZkClient$8.call(ZkClient.java:351)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
        ... 13 more
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
        at org.apache.helix.manager.zk.ZkClient.delete(ZkClient.java:347)
        at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:791)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:115)
        ... 7 more
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /ambari-metrics-cluster/CONTROLLER
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
        at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:104)
        at org.apache.helix.manager.zk.ZkClient$8.call(ZkClient.java:351)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
        ... 13 more
2019-01-09 15:04:40,305 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system...
2019-01-09 15:04:40,305 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped.
2019-01-09 15:04:40,305 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete.
2019-01-09 15:04:40,306 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory
2019-01-09 15:04:40,306 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Unable to initialize HA controller
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:118)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:96)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
Caused by: org.I0Itec.zkclient.exception.ZkException: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /ambari-metrics-clust
er/CONTROLLER        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
        at org.apache.helix.manager.zk.ZkClient.delete(ZkClient.java:347)
        at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:791)
        at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:786)
        at org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:497)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.availability.MetricCollectorHAController.initializeHAController(MetricCollectorHAControll
er.java:156)        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:115)
        ... 7 more
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /ambari-metrics-cluster/CONTROLLER
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
        at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:104)
        at org.apache.helix.manager.zk.ZkClient$8.call(ZkClient.java:351)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
        ... 13 more
2019-01-09 15:04:40,307 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
2019-01-09 15:04:40,314 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at test-da-shanghai-03/192.168.2.187
************************************************************/
2019-01-09 15:04:40,330 WARN org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.mem
store.size
2 REPLIES 2

Explorer

Problem solved.
I run the shell "ps -ef |grep ambari-metri" find out many previous process which creater are "ams".

After killed these process,I rmr the znode named /ambari-metrics-cluster and this node isn't created again, haha.

Expert Contributor

Please stop the collector, clean up the /ambari-metrics-cluster zndoe as well and start. Alternately, you can set custom ams-site : timeline.metrics.service.distributed.collector.mode.disabled = false.