Support Questions
Find answers, ask questions, and share your expertise

ambari metrics collector creashed

ambari metrics collector creashed

we have ambari with hadoop ( HDP 2.6.4 ) cluster

in ambari we have the ambari-metrics-collector

from time to time metrics collector is crashed and we can see from the log the following

note - this happens on both configuration - embedded and distributed

 

 

java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.helix.controller.stages.ClusterEventBlockingQueue.take(ClusterEventBlockingQueue.java:85)
at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:594)
2019-11-26 00:57:20,639 INFO org.apache.helix.controller.GenericHelixController: END ClusterEventProcessor thread
2019-11-26 00:57:20,641 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys67.com/130.21.6.36:61181. Will not attempt to authenticate using SASL (unknown error)
2019-11-26 00:57:20,642 WARN org.apache.zookeeper.ClientCnxn: Session 0x16df2739ea90013 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2019-11-26 00:57:20,686 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:6188
2019-11-26 00:57:20,690 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: org.apache.phoenix.exception.PhoenixIOException: Interrupted calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00\x00METRIC_RECORD
at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices.getTimelineMetrics(TimelineWebServices.java:377)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at

org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1409)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
... 48 more
Caused by: java.io.InterruptedIOException: Giving up trying to location region in meta: thread is interrupted.
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1347)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1149)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1106)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:941)
at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)

 

not clearly why we get

 


2019-11-26 00:57:20,690 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR

 

 


I must to say that we tune the metrics collector parameters according to - https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

so we not understand why ambari metrics collector failed

machines with ambari metrics collector include the following rpm's

 

ambari-metrics-monitor-2.6.2.2-1.x86_64
ambari-metrics-common-2.6.2.2-1.noarch
ambari-agent-2.6.2.2-1.x86_64
ambari-metrics-collector-2.6.2.2-1.x86_64
ambari-metrics-hadoop-sink-2.6.2.2-1.x86_64
ambari-server-2.6.2.2-1.x86_64

 

 

Michael-Bronson