Support Questions
Find answers, ask questions, and share your expertise

Ambari Metric Collector Stops

Highlighted

Ambari Metric Collector Stops

Hi All,

I have below scenario with ambari metrics collector on 27 node HDP 2.6.1 cluster.

1. Flume agents(7) are stopped -- Ambari metrics collector works without any issues and displays metrics in Ambari webUI.

2. Flume agents(7) are running -- Ambari metrics collector stops on its own without giving any error in the logs.

Not sure how running flume agents stops the ambari metrics collector, but I can see below exception in /var/log/ambari-metrics-collector/ambari-metrics-collector.out

INFO: Binding org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices to GuiceManagedComponentProvider with the scope "Singleton"
java.lang.NumberFormatException
        at java.math.BigDecimal.<init>(BigDecimal.java:494)
        at java.math.BigDecimal.<init>(BigDecimal.java:383)
        at java.math.BigDecimal.<init>(BigDecimal.java:806)
        at java.math.BigDecimal.valueOf(BigDecimal.java:1274)
        at org.apache.phoenix.schema.types.PDouble.getMaxLength(PDouble.java:70)
        at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:210)
        at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:173)
        at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:160)


Also some logs from /var/log/ambari-metrics-collector/gc.log-201709151029

017-09-15T10:31:56.937+0100: 138.481: [CMS-concurrent-reset-start]
2017-09-15T10:31:56.941+0100: 138.486: [CMS-concurrent-reset: 0.005/0.005 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2017-09-15T10:34:03.552+0100: 265.097: [GC (Allocation Failure) 2017-09-15T10:34:03.552+0100: 265.097: [ParNew: 235968K->26176K(235968K), 0.0243885 secs] 279609K->82718K(2070976K), 0.0245977 secs] [Times: user=0.10 sys=0.01, real=0.03 secs]
2017-09-15T10:37:12.462+0100: 454.006: [GC (Allocation Failure) 2017-09-15T10:37:12.462+0100: 454.007: [ParNew: 235968K->26176K(235968K), 0.0177335 secs] 292510K->89076K(2070976K), 0.0179993 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]
2017-09-15T10:41:52.432+0100: 733.977: [GC (Allocation Failure) 2017-09-15T10:41:52.432+0100: 733.977: [ParNew: 235968K->20890K(235968K), 0.0307675 secs] 298868K->102991K(2070976K), 0.0309685 secs] [Times: user=0.11 sys=0.01, real=0.03 secs]
2017-09-15T10:46:52.433+0100: 1033.977: [GC (Allocation Failure) 2017-09-15T10:46:52.433+0100: 1033.977: [ParNew: 230682K->22481K(235968K), 0.0078365 secs] 312783K->104582K(2070976K), 0.0080006 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
2017-09-15T10:51:52.447+0100: 1333.992: [GC (Allocation Failure) 2017-09-15T10:51:52.447+0100: 1333.992: [ParNew: 232273K->26176K(235968K), 0.0093279 secs] 314374K->109959K(2070976K), 0.0094784 secs] [Times: user=0.07 sys=0.00, real=0.01 secs]

I have increased the heap space as well per HDP recommendations.

Can anyone please help me to understand this scenario ?

Thanks

2 REPLIES 2
Highlighted

Re: Ambari Metric Collector Stops

Super Mentor

@D Giri

The GC log snippets that you posted are fine. Please do not worry about the GC Allocation Failure messages.

The problem seems to be because of the following NumberFormatException.

java.lang.NumberFormatException       
     at java.math.BigDecimal.<init>(BigDecimal.java:494)
     at java.math.BigDecimal.<init>(BigDecimal.java:383)

.

Is it possible for you to share the ams-hbase and ams-site XM files here. Along with the complete "/var/log/ambari-metrics-collector/ambari-metrics-collector.out" file so that we can see the complete StackTrace in it, to findout which API is it invoking to read the maxLength. (attaching the "ambari-metrics-collector.log" along with the out file will be more useful)

Please also let us know what is your AMS version.

.

Highlighted

Re: Ambari Metric Collector Stops

Thanks @Jay SenSharma for the quick response. As I mentioned earlier it goes down only when flume agents are running else it just works fine.

Ambari metrics verion : 0.1.0

The /var/log/ambari-metrics-collector/ambari-metrics-collector.out file has only this log.

Sep 15, 2017 11:59:51 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider as a provider class
Sep 15, 2017 11:59:51 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices as a root resource class
Sep 15, 2017 11:59:51 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices as a root resource class
Sep 15, 2017 11:59:51 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Sep 15, 2017 11:59:51 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.11 12/09/2011 10:27 AM'
Sep 15, 2017 11:59:52 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Sep 15, 2017 11:59:52 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider to GuiceManagedComponentProvider with the scope "Singleton"
Sep 15, 2017 11:59:52 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices to GuiceManagedComponentProvider with the scope "Singleton"
Sep 15, 2017 11:59:52 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices to GuiceManagedComponentProvider with the scope "Singleton"
java.lang.NumberFormatException
        at java.math.BigDecimal.<init>(BigDecimal.java:494)
        at java.math.BigDecimal.<init>(BigDecimal.java:383)
        at java.math.BigDecimal.<init>(BigDecimal.java:806)
        at java.math.BigDecimal.valueOf(BigDecimal.java:1274)
        at org.apache.phoenix.schema.types.PDouble.getMaxLength(PDouble.java:70)
        at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:210)
        at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:173)
        at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:160)
        at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:1016)
        at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:1000)
        at org.apache.phoenix.parse.BindParseNode.accept(BindParseNode.java:47)
        at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:856)
        at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:593)
        at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:581)
        at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:336)
        at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
        at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
        at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:329)
        at org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeUpdate(PhoenixPreparedStatement.java:199)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.commitMetrics(PhoenixHBaseAccessor.java:310)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.commitMetricsFromCache(PhoenixHBaseAccessor.java:257)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsCacheCommitterThread.run(MetricsCacheCommitterThread.java:35)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)