Support Questions

Find answers, ask questions, and share your expertise

Problem with Ambari Metrics

avatar
Expert Contributor

Ambari Metrics works intermittently. It works a few minutes and then stops to show pictures. Sometimes, the Metric Collector just stops, after a manual start it's working but in a few minutes it stops again. What is going wrong?

screen-ams1.png

My settings

HDP 2.5
Ambari 2.4.0.1
No Kerberos
iptables off
hbase.zookeeper.property.tickTime = 6000
Metrics Service operation mode = distributed
hbase.cluster.distributed = true
hbase.zookeeper.property.clientPort = 2181
hbase.rootdir=hdfs://prodcluster/ams/hbase

Logs

ambari-metrics-collector.log

2016-12-22 11:59:23,030 ERROR org.mortbay.log: /ws/v1/timeline/metrics
javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException
 - with linked exception:
[org.mortbay.jetty.EofException]
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159)
        at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1294)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.xml.bind.MarshalException
 - with linked exception:
[org.mortbay.jetty.EofException]
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249)
        at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179)
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157)
        ... 37 more
Caused by: org.mortbay.jetty.EofException
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
        at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)
        at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)
        at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416)
        at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.endDocument(UTF8XmlOutput.java:141)
        at com.sun.xml.bind.v2.runtime.XMLSerializer.endDocument(XMLSerializer.java:856)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.postwrite(MarshallerImpl.java:374)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:321)
        ... 41 more
2016-12-22 12:00:05,549 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-22 12:00:19,105 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Thu Dec 22 11:50:00 MSK 2016, endTime = Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:19,111 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Thu Dec 22 12:00:19 MSK 2016
2016-12-22 12:00:19,111 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Thu Dec 22 12:00:19 MSK 2016
2016-12-22 12:00:24,077 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Thu Dec 22 12:00:24 MSK 2016
2016-12-22 12:00:24,083 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:24,083 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:24,084 INFO TimelineClusterAggregatorMinute: Last check point time: 1482396900000, lagBy: 324 seconds.
2016-12-22 12:00:24,085 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Thu Dec 22 12:00:24 MSK 2016, startTime = Thu Dec 22 11:55:00 MSK 2016, endTime = Thu Dec 22 12:00:00 MSK 2016

hbase-ams-master-hdp-nn2.hostname.log

2016-12-22 11:22:46,455 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Session: 0x3570f2523bb3db4 closed
2016-12-22 11:22:46,455 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-12-22 11:23:31,207 INFO  [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times


2016-12-22 11:23:31,208 WARN  [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
2016-12-22 11:23:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=59, evicted=0, evictedPerRun=0.0
2016-12-22 11:23:46,460 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x12f8ba29 connecting to ZooKeeper ensemble=hdp-nn1.hostname:2181,hdp-dn1.hostname:2181,hdp-nn2.hostname:2181
2016-12-22 11:23:46,461 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp-nn1.hostname:2181,hdp-dn1.hostname:2181,hdp-nn2.hostname:2181 sessionTimeout=120000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@20fc295f
2016-12-22 11:23:46,464 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp-nn2.hostname/10.255.242.181:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-22 11:23:46,466 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Socket connection established to hdp-nn2.hostname/10.255.242.181:2181, initiating session
2016-12-22 11:23:46,469 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp-nn2.hostname/10.255.242.181:2181, sessionid = 0x3570f2523bb3db5, negotiated timeout = 40000
2016-12-22 11:23:46,495 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3570f2523bb3db5
2016-12-22 11:23:46,499 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Session: 0x3570f2523bb3db5 closed
2016-12-22 11:23:46,499 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-12-22 11:28:41,485 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=89, evicted=0, evictedPerRun=0.0
2016-12-22 11:29:49,178 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore: Remove log: hdfs://prodcluster/ams/hbase/MasterProcWALs/state-00000000000000000001.log
2016-12-22 11:29:49,180 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore: Removed logs: [hdfs://prodcluster/ams/hbase/MasterProcWALs/state-00000000000000000002.log]
2016-12-22 11:33:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=119, evicted=0, evictedPerRun=0.0
2016-12-22 11:38:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=149, evicted=0, evictedPerRun=0.0
2016-12-22 11:43:41,485 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=179, evicted=0, evictedPerRun=0.0
2016-12-22 11:44:51,222 INFO  [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times


2016-12-22 11:44:51,223 WARN  [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
2016-12-22 11:48:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=209, evicted=0, evictedPerRun=0.0
2016-12-22 11:50:51,205 INFO  [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times


2016-12-22 11:50:51,205 WARN  [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics

I removed folder /var/lib/ambari-metrics-collector/hbase-tmp and restarted AMS as recommended here https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html, but it did not help.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @Aravindan Vijayan

Yesterday suddenly Ambari Metrics has started working (and still works). The only thing I have changed yesterday - install Apache Atlas, which required restart almost all components, may be it helped. Thanks for your assistance!

View solution in original post

11 REPLIES 11

avatar
Contributor

You need to stop ambari metrics service via ambari and then remove all temp files. Go to Ambari Metrics collector service host. and execute the below command.

 

mv /var/lib/ambari-metrics-collector /tmp/ambari-metrics-collector_OLD

 

Now you can restart ams service again and now you should be good with Ambari Metrics.

avatar
New Contributor

I am having the same problem. Here is ambari-metrics-collector.log.