Support Questions
Find answers, ask questions, and share your expertise

Unable to start Ambari Metrics Collector

New Contributor

Hi,

I just downloaded the new sandbox (2.6.0), and tried to start Ambari Metrics Collector, but it fails to start.

I tried those suggestions without success : https://community.hortonworks.com/questions/76636/cannot-start-ambari-metrics-collector-on-hdp-25.ht...

I have those messages in my logs :

2017-04-24 20:56:43,471 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Reset ClusterStatusMonitor
2017-04-24 20:56:43,472 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster,resourceName=METRIC_AGGREGATORS
2017-04-24 20:56:43,472 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster,instanceName=sandbox.hortonworks.com_12001
2017-04-24 20:56:43,472 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster,instanceName=sandbox.hortonworks.com_12001,resourceName=METRIC_AGGREGATORS
2017-04-24 20:56:43,473 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster
2017-04-24 20:56:43,473 INFO org.apache.helix.manager.zk.CallbackHandler: 116 END:INVOKE /ambari-metrics-cluster/CONTROLLER listener:org.apache.helix.manager.zk.DistributedLeaderElection Took: 3ms
2017-04-24 20:56:43,473 INFO org.apache.helix.manager.zk.ZkClient: Closing zkclient: State:CONNECTED Timeout:30000 sessionid:0x15ba1ad0262002d local:/172.17.0.2:39234 remoteserver:sandbox.hortonworks.com/172.17.0.2:2181 lastZxid:2335 xid:283 sent:283 recv:287 queuedpkts:0 pendingresp:0 queuedevents:0
2017-04-24 20:56:43,473 WARN org.apache.helix.manager.zk.CallbackHandler: Skip processing callbacks for listener: org.apache.helix.controller.GenericHelixController@2227a6c1, path: /ambari-metrics-cluster/LIVEINSTANCES, expected types: [INIT] but was CALLBACK
2017-04-24 20:56:43,476 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline
java.lang.NullPointerException
	at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276)
	at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
2017-04-24 20:56:43,477 INFO org.apache.zookeeper.ZooKeeper: Session: 0x15ba1ad0262002d closed
2017-04-24 20:56:43,477 INFO org.apache.helix.manager.zk.ZkClient: Closed zkclient
2017-04-24 20:56:43,478 INFO org.apache.helix.manager.zk.ZKHelixManager: Cluster manager: sandbox.hortonworks.com disconnected
2017-04-24 20:56:43,478 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x15ba1ad0262002d
2017-04-24 20:56:43,489 INFO org.I0Itec.zkclient.ZkClient: Waiting for keeper state SyncConnected
2017-04-24 20:56:43,489 INFO org.I0Itec.zkclient.ZkClient: Waiting for keeper state SyncConnected
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline
java.lang.IllegalStateException: ZkClient already closed!
	at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:987)
	at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:208)
	at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:672)
	at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildNames(ZkBaseDataAccessor.java:442)
	at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:400)
	at org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:301)
	at org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:347)
	at org.apache.helix.task.TaskDriver.getWorkflows(TaskDriver.java:800)
	at org.apache.helix.monitoring.mbeans.ClusterStatusMonitor.refreshWorkflowsStatus(ClusterStatusMonitor.java:403)
	at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276)
	at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,494 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,494 INFO org.apache.helix.controller.GenericHelixController: END ClusterEventProcessor thread

Any suggestion ?

Regards,

Philippe

13 REPLIES 13

Re: Unable to start Ambari Metrics Collector

Expert Contributor

@Philippe Kernevez Is the Zookeeper Service on your cluster up and running?

Re: Unable to start Ambari Metrics Collector

New Contributor

Exact same problem here. Yes, Zookeeper is running fine. Regards Janos

Re: Unable to start Ambari Metrics Collector

Expert Contributor

@Janos Geller

Can you attach the following ?

  • /var/log/ambari-metrics-collector/ambari-metrics-collector.log
  • /etc/ambari-metrics-collector/conf/ams-site.xml
  • /etc/ams-hbase/conf/hbase-site.xml

Re: Unable to start Ambari Metrics Collector

New Contributor

@Aravindan Vijayan

Dear Aravindan,

I had to remove a few lines from the beginning of the log file, it was too big for upload otherwise.

Thanks for your help in advance: Janos

Re: Unable to start Ambari Metrics Collector

Expert Contributor

@Janos Geller

Please try changing the following ams-site config.

  • Config key - timeline.metrics.service.webapp.address
  • Current Value - 0.0.0.0::host_group_1%:6188
  • Recommended Value - 0.0.0.0:6188

Start / Restart Metrics collector after this change.

Re: Unable to start Ambari Metrics Collector

Explorer

@avijayan 

I tried the following.

Ambari-Metrics - Config

timeline.metrics.service.webapp.address - 0.0.0.0::host_group_2%:6188(Default Value)

Changed to

timeline.metrics.service.webapp.address - <Metrics_Collector_Hostname>:6188

 

This worked and Metrics Collector was back online...!!! Thanks for the clue...

Re: Unable to start Ambari Metrics Collector

New Contributor

@Aravindan Vijayan

Unfortunately this didn't help 😞

Re: Unable to start Ambari Metrics Collector

Expert Contributor

Can you attach the latest log?

Re: Unable to start Ambari Metrics Collector

New Contributor

Actually, your suggestion to edit ams-site.xml did work (for some reason I had to restart the whole sandbox). I do get a Grafana error when starting Metrics, but after a while it goes away and Metrics seem to work fine.

Thanks for your help on this issue.