Support Questions

Find answers, ask questions, and share your expertise

Unable to start Ambari Metrics Collector

New Contributor

Hi,

I just downloaded the new sandbox (2.6.0), and tried to start Ambari Metrics Collector, but it fails to start.

I tried those suggestions without success : https://community.hortonworks.com/questions/76636/cannot-start-ambari-metrics-collector-on-hdp-25.ht...

I have those messages in my logs :

2017-04-24 20:56:43,471 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Reset ClusterStatusMonitor
2017-04-24 20:56:43,472 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster,resourceName=METRIC_AGGREGATORS
2017-04-24 20:56:43,472 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster,instanceName=sandbox.hortonworks.com_12001
2017-04-24 20:56:43,472 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster,instanceName=sandbox.hortonworks.com_12001,resourceName=METRIC_AGGREGATORS
2017-04-24 20:56:43,473 INFO org.apache.helix.monitoring.mbeans.ClusterStatusMonitor: Unregistering ClusterStatus: cluster=ambari-metrics-cluster
2017-04-24 20:56:43,473 INFO org.apache.helix.manager.zk.CallbackHandler: 116 END:INVOKE /ambari-metrics-cluster/CONTROLLER listener:org.apache.helix.manager.zk.DistributedLeaderElection Took: 3ms
2017-04-24 20:56:43,473 INFO org.apache.helix.manager.zk.ZkClient: Closing zkclient: State:CONNECTED Timeout:30000 sessionid:0x15ba1ad0262002d local:/172.17.0.2:39234 remoteserver:sandbox.hortonworks.com/172.17.0.2:2181 lastZxid:2335 xid:283 sent:283 recv:287 queuedpkts:0 pendingresp:0 queuedevents:0
2017-04-24 20:56:43,473 WARN org.apache.helix.manager.zk.CallbackHandler: Skip processing callbacks for listener: org.apache.helix.controller.GenericHelixController@2227a6c1, path: /ambari-metrics-cluster/LIVEINSTANCES, expected types: [INIT] but was CALLBACK
2017-04-24 20:56:43,476 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline
java.lang.NullPointerException
	at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276)
	at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
2017-04-24 20:56:43,477 INFO org.apache.zookeeper.ZooKeeper: Session: 0x15ba1ad0262002d closed
2017-04-24 20:56:43,477 INFO org.apache.helix.manager.zk.ZkClient: Closed zkclient
2017-04-24 20:56:43,478 INFO org.apache.helix.manager.zk.ZKHelixManager: Cluster manager: sandbox.hortonworks.com disconnected
2017-04-24 20:56:43,478 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x15ba1ad0262002d
2017-04-24 20:56:43,489 INFO org.I0Itec.zkclient.ZkClient: Waiting for keeper state SyncConnected
2017-04-24 20:56:43,489 INFO org.I0Itec.zkclient.ZkClient: Waiting for keeper state SyncConnected
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline
java.lang.IllegalStateException: ZkClient already closed!
	at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:987)
	at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:208)
	at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:672)
	at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildNames(ZkBaseDataAccessor.java:442)
	at org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:400)
	at org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:301)
	at org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:347)
	at org.apache.helix.task.TaskDriver.getWorkflows(TaskDriver.java:800)
	at org.apache.helix.monitoring.mbeans.ClusterStatusMonitor.refreshWorkflowsStatus(ClusterStatusMonitor.java:403)
	at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276)
	at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,489 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,494 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
2017-04-24 20:56:43,494 INFO org.apache.helix.controller.GenericHelixController: END ClusterEventProcessor thread

Any suggestion ?

Regards,

Philippe

13 REPLIES 13

Expert Contributor

@Philippe Kernevez Is the Zookeeper Service on your cluster up and running?

New Contributor

Exact same problem here. Yes, Zookeeper is running fine. Regards Janos

Expert Contributor

@Janos Geller

Can you attach the following ?

  • /var/log/ambari-metrics-collector/ambari-metrics-collector.log
  • /etc/ambari-metrics-collector/conf/ams-site.xml
  • /etc/ams-hbase/conf/hbase-site.xml

New Contributor

@Aravindan Vijayan

Dear Aravindan,

I had to remove a few lines from the beginning of the log file, it was too big for upload otherwise.

Thanks for your help in advance: Janos

Expert Contributor

@Janos Geller

Please try changing the following ams-site config.

  • Config key - timeline.metrics.service.webapp.address
  • Current Value - 0.0.0.0::host_group_1%:6188
  • Recommended Value - 0.0.0.0:6188

Start / Restart Metrics collector after this change.

Explorer

@avijayan 

I tried the following.

Ambari-Metrics - Config

timeline.metrics.service.webapp.address - 0.0.0.0::host_group_2%:6188(Default Value)

Changed to

timeline.metrics.service.webapp.address - <Metrics_Collector_Hostname>:6188

 

This worked and Metrics Collector was back online...!!! Thanks for the clue...

New Contributor

@Aravindan Vijayan

Unfortunately this didn't help 😞

Expert Contributor

Can you attach the latest log?

New Contributor

Actually, your suggestion to edit ams-site.xml did work (for some reason I had to restart the whole sandbox). I do get a Grafana error when starting Metrics, but after a while it goes away and Metrics seem to work fine.

Thanks for your help on this issue.

New Contributor

amabarimetricelogs.zip

Hi all,

Sorry for the late answer, I have not been notified about your answer. I checked my notification configuration but I can't understand why I don't receive your comments...

@Aravindan :

Yes zookeeper is up and running (1 instance)

I stopped all Ambari Metrics services, I did your change (timeline.metrics.service.webapp.address) and started Ambari Metrics services.

I had an error during the start of Graphana but all worked fine at the end. Even there was a failure during the start process, the service is up and running.

I attached a zip file with the following files :

* grafana.out * grafana.log * hbase-site.xml * ams-site.xml * ambari-metrics-collector.log * grafana_service_start.log

Regards,

Philippe

New Contributor

Hi @Philippe Kernevez ,

Did this get resolved? I'm facing a similar problem in the same (but dockerized) sandbox. I've also tried suggestions given your link but nothing worked. I also tried @Aravindan Vijayan's earlier suggestion but it didn't work.

Attaching files as requested by @Aravindan Vijayan in an earlier post.

Regards

New Contributor

Hi Anjal,

As I said, it was solved. But I don't really know how.

Regards,

Philippe

New Contributor

It looks like Ambari server is unable to sync up well with metrics updates. This problem has been resolved for me by stopping the metrics services, restarting the ambari-server service and then bringing back online, the metrics services.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.