Support Questions

Find answers, ask questions, and share your expertise

Problem with Ambari Metrics

avatar
Expert Contributor

Ambari Metrics works intermittently. It works a few minutes and then stops to show pictures. Sometimes, the Metric Collector just stops, after a manual start it's working but in a few minutes it stops again. What is going wrong?

screen-ams1.png

My settings

HDP 2.5
Ambari 2.4.0.1
No Kerberos
iptables off
hbase.zookeeper.property.tickTime = 6000
Metrics Service operation mode = distributed
hbase.cluster.distributed = true
hbase.zookeeper.property.clientPort = 2181
hbase.rootdir=hdfs://prodcluster/ams/hbase

Logs

ambari-metrics-collector.log

2016-12-22 11:59:23,030 ERROR org.mortbay.log: /ws/v1/timeline/metrics
javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException
 - with linked exception:
[org.mortbay.jetty.EofException]
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159)
        at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1294)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.xml.bind.MarshalException
 - with linked exception:
[org.mortbay.jetty.EofException]
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249)
        at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179)
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157)
        ... 37 more
Caused by: org.mortbay.jetty.EofException
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
        at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)
        at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)
        at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416)
        at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.endDocument(UTF8XmlOutput.java:141)
        at com.sun.xml.bind.v2.runtime.XMLSerializer.endDocument(XMLSerializer.java:856)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.postwrite(MarshallerImpl.java:374)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:321)
        ... 41 more
2016-12-22 12:00:05,549 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-22 12:00:19,105 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Thu Dec 22 11:50:00 MSK 2016, endTime = Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:19,111 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Thu Dec 22 12:00:19 MSK 2016
2016-12-22 12:00:19,111 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Thu Dec 22 12:00:19 MSK 2016
2016-12-22 12:00:24,077 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Thu Dec 22 12:00:24 MSK 2016
2016-12-22 12:00:24,083 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:24,083 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:24,084 INFO TimelineClusterAggregatorMinute: Last check point time: 1482396900000, lagBy: 324 seconds.
2016-12-22 12:00:24,085 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Thu Dec 22 12:00:24 MSK 2016, startTime = Thu Dec 22 11:55:00 MSK 2016, endTime = Thu Dec 22 12:00:00 MSK 2016

hbase-ams-master-hdp-nn2.hostname.log

2016-12-22 11:22:46,455 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Session: 0x3570f2523bb3db4 closed
2016-12-22 11:22:46,455 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-12-22 11:23:31,207 INFO  [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times


2016-12-22 11:23:31,208 WARN  [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
2016-12-22 11:23:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=59, evicted=0, evictedPerRun=0.0
2016-12-22 11:23:46,460 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x12f8ba29 connecting to ZooKeeper ensemble=hdp-nn1.hostname:2181,hdp-dn1.hostname:2181,hdp-nn2.hostname:2181
2016-12-22 11:23:46,461 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp-nn1.hostname:2181,hdp-dn1.hostname:2181,hdp-nn2.hostname:2181 sessionTimeout=120000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@20fc295f
2016-12-22 11:23:46,464 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp-nn2.hostname/10.255.242.181:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-22 11:23:46,466 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Socket connection established to hdp-nn2.hostname/10.255.242.181:2181, initiating session
2016-12-22 11:23:46,469 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp-nn2.hostname/10.255.242.181:2181, sessionid = 0x3570f2523bb3db5, negotiated timeout = 40000
2016-12-22 11:23:46,495 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3570f2523bb3db5
2016-12-22 11:23:46,499 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Session: 0x3570f2523bb3db5 closed
2016-12-22 11:23:46,499 INFO  [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-12-22 11:28:41,485 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=89, evicted=0, evictedPerRun=0.0
2016-12-22 11:29:49,178 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore: Remove log: hdfs://prodcluster/ams/hbase/MasterProcWALs/state-00000000000000000001.log
2016-12-22 11:29:49,180 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore: Removed logs: [hdfs://prodcluster/ams/hbase/MasterProcWALs/state-00000000000000000002.log]
2016-12-22 11:33:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=119, evicted=0, evictedPerRun=0.0
2016-12-22 11:38:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=149, evicted=0, evictedPerRun=0.0
2016-12-22 11:43:41,485 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=179, evicted=0, evictedPerRun=0.0
2016-12-22 11:44:51,222 INFO  [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times


2016-12-22 11:44:51,223 WARN  [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
2016-12-22 11:48:41,484 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=209, evicted=0, evictedPerRun=0.0
2016-12-22 11:50:51,205 INFO  [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times


2016-12-22 11:50:51,205 WARN  [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics

I removed folder /var/lib/ambari-metrics-collector/hbase-tmp and restarted AMS as recommended here https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html, but it did not help.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @Aravindan Vijayan

Yesterday suddenly Ambari Metrics has started working (and still works). The only thing I have changed yesterday - install Apache Atlas, which required restart almost all components, may be it helped. Thanks for your assistance!

View solution in original post

11 REPLIES 11

avatar

avatar
Super Collaborator

@Alena Melnikova

Can you verify if the Ambari and AMS versions are the same using 'rpm -qa | grep ambari'?

How many nodes do you have in your cluster?

Please share the contents of /etc/ambari-metrics-collector/conf/ams-env.sh and /etc/ams-hbase/conf/hbase-env.sh in Metrics collector host.

avatar
Expert Contributor

Hi @Aravindan Vijayan

I have 7 nodes (2 nn + 5 dn).

Here is info:

rpm -qa | grep ambari

ambari-metrics-collector-2.4.0.1-1.x86_64
ambari-metrics-hadoop-sink-2.4.0.1-1.x86_64
ambari-agent-2.4.0.1-1.x86_64
ambari-infra-solr-client-2.4.0.1-1.x86_64
ambari-logsearch-logfeeder-2.4.0.1-1.x86_64
ambari-metrics-monitor-2.4.0.1-1.x86_64
ambari-metrics-grafana-2.4.0.1-1.x86_64
ambari-infra-solr-2.4.0.1-1.x86_64


cat /etc/ambari-metrics-collector/conf/ams-env.sh

# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# Collector Log directory for log4j
export AMS_COLLECTOR_LOG_DIR=/var/log/ambari-metrics-collector
# Monitor Log directory for outfile
export AMS_MONITOR_LOG_DIR=/var/log/ambari-metrics-monitor
# Collector pid directory
export AMS_COLLECTOR_PID_DIR=/var/run/ambari-metrics-collector
# Monitor pid directory
export AMS_MONITOR_PID_DIR=/var/run/ambari-metrics-monitor
# AMS HBase pid directory
export AMS_HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# AMS Collector heapsize
export AMS_COLLECTOR_HEAPSIZE=1024m
# HBase normalizer enabled
export AMS_HBASE_NORMALIZER_ENABLED=False
# HBase compaction policy enabled
export AMS_HBASE_FIFO_COMPACTION_ENABLED=True
# HBase Tables Initialization check enabled
export AMS_HBASE_INIT_CHECK_ENABLED=True
# AMS Collector options
export AMS_COLLECTOR_OPTS="-Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native"
# AMS Collector GC options
export AMS_COLLECTOR_GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-`date +'%Y%m%d%H%M'`"
export AMS_COLLECTOR_OPTS="$AMS_COLLECTOR_OPTS $AMS_COLLECTOR_GC_OPTS"
cat /etc/ams-hbase/conf/hbase-env.sh

# Set environment variables here.
# The java implementation to use. Java 1.6+ required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/ams-hbase/conf}
# Extra Java CLASSPATH elements. Optional.
additional_cp=
if [  -n "$additional_cp" ];
then
  export HBASE_CLASSPATH=${HBASE_CLASSPATH}:$additional_cp
else
  export HBASE_CLASSPATH=${HBASE_CLASSPATH}
fi
# The maximum amount of heap to use for hbase shell.
export HBASE_SHELL_OPTS="-Xmx256m"
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp"
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS=" -Xms512m -Xmx512m -Xmn102m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HBASE_REGIONSERVER_OPTS=" -Xmn128m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms896m -Xmx896m"
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/var/log/ambari-metrics-collector
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
# use embedded native libs
_HADOOP_NATIVE_LIB="/usr/lib/ams-hbase/lib/hadoop-native/"
export HBASE_OPTS="$HBASE_OPTS -Djava.library.path=${_HADOOP_NATIVE_LIB}"
# Unset HADOOP_HOME to avoid importing HADOOP installed cluster related configs like: /usr/hdp/2.2.0.0-2041/hadoop/conf/
export HADOOP_HOME=/usr/lib/ams-hbase/
# Explicitly Setting HBASE_HOME for AMS HBase so that there is no conflict
export HBASE_HOME=/usr/lib/ams-hbase/

avatar
Super Collaborator

@Alena Melnikova

If you are trying to clean up AMS data in distributed mode, you have to stop Metrics collector first. Then you have to clean up the HBase rootdir in HDFS, as well as delete the znode in cluster Zk service. You should be able to do it using the zkCli utility. The znode to delete will be /ams-hbase-unsecure.

avatar
Expert Contributor

Hi @Rahul Pathak

I have tried, but without success((

Here ambari-metrics-collector.log

2016-12-23 08:49:16,046 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system...
2016-12-23 08:49:16,047 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped.
2016-12-23 08:49:16,048 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete.
2016-12-23 08:49:16,048 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory
2016-12-23 08:49:16,048 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Error creating Metrics Schema in HBase using Phoenix.
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.initMetricSchema(PhoenixHBaseAccessor.java:470)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:94)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:86)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
Caused by: org.apache.phoenix.exception.PhoenixIOException: SYSTEM.CATALOG
        at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1292)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1257)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1453)
        at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2180)
        at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:865)
        at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:194)
        at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:343)
        at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
        at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
        at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:329)
        at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1421)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2378)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327)
        at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327)
        at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
        at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
        at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:270)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.query.DefaultPhoenixDataSource.getConnection(DefaultPhoenixDataSource.java:82)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.getConnection(PhoenixHBaseAccessor.java:376)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.getConnectionRetryingOnException(PhoenixHBaseAccessor.java:354)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.initMetricSchema(PhoenixHBaseAccessor.java:398)
        ... 8 more
Caused by: org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CATALOG
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1264)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938)
        at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
        at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:504)
        at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:720)
        at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:690)
        at org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1757)
        at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1712)
        at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1692)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1275)
        ... 31 more
2016-12-23 08:49:16,052 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
2016-12-23 08:49:16,069 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at hdp-nn2.hostname/10.255.242.181
************************************************************/
2016-12-23 08:49:16,115 WARN org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size

avatar
Expert Contributor

@Aravindan Vijayan

I did this:

1) Turn on Maintenance mode
2) Stop Ambari Metrics
3) hadoop fs -rmr /ams/hbase/*
4) rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/*
5)
[zk: localhost:2181(CONNECTED) 0] ls /
[registry, controller, brokers, storm, zookeeper, infra-solr,
hiveserver2-hive2, hbase-unsecure, yarn-leader-election, tracers, hadoop-ha,
admin, isr_change_notification, services, templeton-hadoop, accumulo,
controller_epoch, hiveserver2, llap-unsecure, rmstore, ranger_audits,
consumers, config, ams-hbase-unsecure]
[zk: localhost:2181(CONNECTED) 1] rmr /ams-hbase-unsecure
[zk: localhost:2181(CONNECTED) 2] ls /
[registry, controller, brokers, storm, zookeeper, infra-solr,
hiveserver2-hive2, hbase-unsecure, yarn-leader-election, tracers, hadoop-ha,
admin, isr_change_notification, services, templeton-hadoop, accumulo,
controller_epoch, hiveserver2, llap-unsecure, rmstore, ranger_audits,
consumers, config]
6) Start Ambari Metrics 
7) Turn off Maintenance mode

After about 15 minutes I got this log:

2016-12-23 11:35:08,673 ERROR org.mortbay.log: /ws/v1/timeline/metrics/
javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException
 - with linked exception:
[org.mortbay.jetty.EofException]
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159)
        at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1294)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.xml.bind.MarshalException
 - with linked exception:
[org.mortbay.jetty.EofException]
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249)
        at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179)
        at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157)
        ... 37 more
Caused by: org.mortbay.jetty.EofException
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
        at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)
        at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)
        at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416)
        at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.endDocument(UTF8XmlOutput.java:141)
        at com.sun.xml.bind.v2.runtime.XMLSerializer.endDocument(XMLSerializer.java:856)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.postwrite(MarshallerImpl.java:374)
        at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:321)
        ... 41 more
2016-12-23 11:35:09,796 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Saved 8606 metadata records.
2016-12-23 11:35:09,843 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Saved 7 hosted apps metadata records.
2016-12-23 11:35:25,123 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Last check point time: 1482481800000, lagBy: 325 seconds.
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016, startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,152 INFO TimelineMetricHostAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Last check point time: 1482481800000, lagBy: 325 seconds.
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016, startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: 0 row(s) updated.
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:40:26,448 INFO TimelineMetricHostAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,448 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Last check point time: 1482482100000, lagBy: 326 seconds.
2016-12-23 11:40:26,449 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016, startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Last check point time: 1482482100000, lagBy: 326 seconds.
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016, startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,464 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-23 11:40:26,464 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,465 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,465 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:46,839 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 25694  actions to finish
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: 22847 row(s) updated.
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:46 MSK 2016
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:46 MSK 2016
2016-12-23 11:41:40,503 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 8014  actions to finish



What to do next?

avatar
Super Collaborator

@Alena Melnikova

Based on the following log statement,

2016-12-2311:40:46,839 INFO org.apache.hadoop.hbase.client.AsyncProcess:#1, waiting for 25694  actions to finish

I think AMS is getting a write load more than it can handle. It could be either because it is a huge cluster and the memory settings are not properly tuned, or because one or more components are sending a huge number of metrics.

Can I have the following ?

1. Number of nodes in the cluster

2. Response to the following GET call

http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/metadata

http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/hosts

3. Following config files in Metrics Collector host -

/etc/ambari-metrics-collector/conf/  ams-env.sh & ams-site.xml

/etc/ams-hbase/conf/ hbase-site.xml & hbase-env.sh

avatar
Expert Contributor

Hi @Aravindan Vijayan

I have 7 nodes (2 nn + 5 dn).

Response to GET calls

http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/metadata (it's too long, I cut it)

{"type":"COUNTER","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.rollRequest","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.GcCount","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitSize_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.HlogSplitSize_98th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.QueueCallTime_median","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.HlogSplitSize_max","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_median","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880896,"metricname":"metricssystem.MetricsSystem.PublishNumOps","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880874,"metricname":"master.Master.exceptions.FailedSanityCheckException","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.MemHeapUsedM","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_mean","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880896,"metricname":"metricssystem.MetricsSystem.Sink_timelineDropped","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitTime_95th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.BulkAssign_95th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.MemNonHeapUsedM","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_99.9th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.RequestSize_mean","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.RequestSize_min","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.AppendSize_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_99.9th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.BulkAssign_75th_percentile","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitTime_num_ops","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.SyncTime_90th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_90th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.AppendTime_max","supportsAggregation":true}],"logfeeder":[{"type":"Long","seriesStartTime":1482480913546,"metricname":"output.solr.write_logs","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.count","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480943578,"metricname":"filter.error.keyvalue","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"filter.error.grok","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.read_bytes","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"output.solr.write_bytes","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.read_lines","supportsAggregation":true}]}
http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/hosts

{"hdp-dn3.hostname":["accumulo","datanode","journalnode","HOST","nodemanager","hbase","logfeeder"],"hdp-dn5.hostname":["accumulo","datanode","HOST","nodemanager","hbase","logfeeder"],"hdp-dn2.hostname":["accumulo","datanode","HOST","nodemanager","logfeeder"],"hdp-nn1.hostname":["accumulo","nimbus","resourcemanager","journalnode","HOST","applicationhistoryserver","namenode","hbase","kafka_broker","logfeeder"],"hdp-dn1.hostname":["accumulo","hiveserver2","datanode","hivemetastore","HOST","nodemanager","logfeeder"],"hdp-dn4.hostname":["accumulo","datanode","HOST","nodemanager","hbase","logfeeder"],"hdp-nn2.hostname":["hiveserver2","hivemetastore","journalnode","resourcemanager","HOST","jobhistoryserver","namenode","ams-hbase","logfeeder"]}

Config files

cat /etc/ambari-metrics-collector/conf/ams-env.sh

# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# Collector Log directory for log4j
export AMS_COLLECTOR_LOG_DIR=/var/log/ambari-metrics-collector
# Monitor Log directory for outfile
export AMS_MONITOR_LOG_DIR=/var/log/ambari-metrics-monitor
# Collector pid directory
export AMS_COLLECTOR_PID_DIR=/var/run/ambari-metrics-collector
# Monitor pid directory
export AMS_MONITOR_PID_DIR=/var/run/ambari-metrics-monitor
# AMS HBase pid directory
export AMS_HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# AMS Collector heapsize
export AMS_COLLECTOR_HEAPSIZE=1024m
# HBase normalizer enabled
export AMS_HBASE_NORMALIZER_ENABLED=False
# HBase compaction policy enabled
export AMS_HBASE_FIFO_COMPACTION_ENABLED=True
# HBase Tables Initialization check enabled
export AMS_HBASE_INIT_CHECK_ENABLED=True
# AMS Collector options
export AMS_COLLECTOR_OPTS="-Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native"
# AMS Collector GC options
export AMS_COLLECTOR_GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-`date +'%Y%m%d%H%M'`"
export AMS_COLLECTOR_OPTS="$AMS_COLLECTOR_OPTS $AMS_COLLECTOR_GC_OPTS"
cat /etc/ambari-metrics-collector/conf/ams-site.xml

  <configuration>
        <property>
      <name>phoenix.query.maxGlobalMemoryPercentage</name>
      <value>25</value>
    </property>
        <property>
      <name>phoenix.spool.directory</name>
      <value>/tmp</value>
    </property>    
    <property>
      <name>timeline.metrics.aggregator.checkpoint.dir</name>
      <value>/var/lib/ambari-metrics-collector/checkpoint</value>
    </property>    
    <property>
      <name>timeline.metrics.aggregators.skip.blockcache.enabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.cache.commit.interval</name>
      <value>3</value>
    </property>    
    <property>
      <name>timeline.metrics.cache.enabled</name>
      <value>true</value>
    </property>    
    <property>
      <name>timeline.metrics.cache.size</name>
      <value>150</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregate.splitpoints</name>
      <value>kafka.server.BrokerTopicMetrics.FailedFetchRequestsPerSec.meanRate</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.daily.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.daily.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.daily.interval</name>
      <value>86400</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.daily.ttl</name>
      <value>63072000</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.hourly.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.hourly.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.hourly.interval</name>
      <value>3600</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.hourly.ttl</name>
      <value>31536000</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.interpolation.enabled</name>
      <value>true</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.minute.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.minute.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.minute.interval</name>
      <value>300</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.minute.ttl</name>
      <value>2592000</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.second.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.second.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.second.interval</name>
      <value>120</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.second.timeslice.interval</name>
      <value>30</value>
    </property>    
    <property>
      <name>timeline.metrics.cluster.aggregator.second.ttl</name>
      <value>259200</value>
    </property>    
    <property>
      <name>timeline.metrics.daily.aggregator.minute.interval</name>
      <value>86400</value>
    </property>    
    <property>
      <name>timeline.metrics.hbase.compression.scheme</name>
      <value>SNAPPY</value>
    </property>    
    <property>
      <name>timeline.metrics.hbase.data.block.encoding</name>
      <value>FAST_DIFF</value>
    </property>    
    <property>
      <name>timeline.metrics.hbase.fifo.compaction.enabled</name>
      <value>true</value>
    </property>    
    <property>
      <name>timeline.metrics.hbase.init.check.enabled</name>
      <value>true</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregate.splitpoints</name>
      <value>kafka.server.BrokerTopicMetrics.FailedFetchRequestsPerSec.meanRate</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.daily.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.daily.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.daily.ttl</name>
      <value>31536000</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.hourly.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.hourly.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.hourly.interval</name>
      <value>3600</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.hourly.ttl</name>
      <value>2592000</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.minute.checkpointCutOffMultiplier</name>
      <value>2</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.minute.disabled</name>
      <value>false</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.minute.interval</name>
      <value>300</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.minute.ttl</name>
      <value>604800</value>
    </property>    
    <property>
      <name>timeline.metrics.host.aggregator.ttl</name>
      <value>86400</value>
    </property>    
    <property>
      <name>timeline.metrics.service.checkpointDelay</name>
      <value>60</value>
    </property>    
    <property>
      <name>timeline.metrics.service.cluster.aggregator.appIds</name>
      <value>datanode,nodemanager,hbase</value>
    </property>    
    <property>
      <name>timeline.metrics.service.default.result.limit</name>
      <value>15840</value>
    </property>    
    <property>
      <name>timeline.metrics.service.handler.thread.count</name>
      <value>20</value>
    </property>    
    <property>
      <name>timeline.metrics.service.http.policy</name>
      <value>HTTP_ONLY</value>
    </property>    
    <property>
      <name>timeline.metrics.service.operation.mode</name>
      <value>distributed</value>
    </property>    
    <property>
      <name>timeline.metrics.service.resultset.fetchSize</name>
      <value>2000</value>
    </property>    
    <property>
      <name>timeline.metrics.service.rpc.address</name>
      <value>0.0.0.0:60200</value>
    </property>    
    <property>
      <name>timeline.metrics.service.use.groupBy.aggregators</name>
      <value>true</value>
    </property>    
    <property>
      <name>timeline.metrics.service.watcher.delay</name>
      <value>30</value>
    </property>    
    <property>
      <name>timeline.metrics.service.watcher.disabled</name>
      <value>true</value>
    </property>    
    <property>
      <name>timeline.metrics.service.watcher.initial.delay</name>
      <value>600</value>
    </property>    
    <property>
      <name>timeline.metrics.service.watcher.timeout</name>
      <value>30</value>
    </property>    
    <property>
      <name>timeline.metrics.service.webapp.address</name>
      <value>hdp-nn2.hostname:6188</value>
    </property>    
    <property>
      <name>timeline.metrics.sink.collection.period</name>
      <value>10</value>
    </property>    
    <property>
      <name>timeline.metrics.sink.report.interval</name>
      <value>60</value>
    </property>    
  </configuration>
cat /etc/ams-hbase/conf/hbase-site.xml

  <configuration>    
    <property>
      <name>dfs.block.access.token.enable</name>
      <value>true</value>
    </property>    
    <property>
      <name>dfs.blockreport.initialDelay</name>
      <value>120</value>
    </property>
    
    <property>
      <name>dfs.blocksize</name>
      <value>134217728</value>
    </property>
    
    <property>
      <name>dfs.client.failover.proxy.provider.prodcluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    
    <property>
      <name>dfs.client.read.shortcircuit</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.client.read.shortcircuit.streams.cache.size</name>
      <value>4096</value>
    </property>
    
    <property>
      <name>dfs.client.retry.policy.enabled</name>
      <value>false</value>
    </property>
    
    <property>
      <name>dfs.cluster.administrators</name>
      <value> hdfs</value>
    </property>
    
    <property>
      <name>dfs.content-summary.limit</name>
      <value>5000</value>
    </property>
    
    <property>
      <name>dfs.datanode.address</name>
      <value>0.0.0.0:50010</value>
    </property>
    
    <property>
      <name>dfs.datanode.balance.bandwidthPerSec</name>
      <value>6250000</value>
    </property>
    
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>/hdfs/hadoop/hdfs/data</value>
      <final>true</final>
    </property>
    
    <property>
      <name>dfs.datanode.data.dir.perm</name>
      <value>750</value>
    </property>
    
    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>65906998272</value>
    </property>
    
    <property>
      <name>dfs.datanode.failed.volumes.tolerated</name>
      <value>0</value>
      <final>true</final>
    </property>
    
    <property>
      <name>dfs.datanode.http.address</name>
      <value>0.0.0.0:50075</value>
    </property>
    
    <property>
      <name>dfs.datanode.https.address</name>
      <value>0.0.0.0:50475</value>
    </property>
    
    <property>
      <name>dfs.datanode.ipc.address</name>
      <value>0.0.0.0:8010</value>
    </property>
    
    <property>
      <name>dfs.datanode.max.transfer.threads</name>
      <value>16384</value>
    </property>
    
    <property>
      <name>dfs.domain.socket.path</name>
      <value>/var/lib/hadoop-hdfs/dn_socket</value>
    </property>
    
    <property>
      <name>dfs.encrypt.data.transfer.cipher.suites</name>
      <value>AES/CTR/NoPadding</value>
    </property>
    
    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>shell(/bin/true)</value>
    </property>
    
    <property>
      <name>dfs.ha.namenodes.prodcluster</name>
      <value>nn1,nn2</value>
    </property>
    
    <property>
      <name>dfs.heartbeat.interval</name>
      <value>3</value>
    </property>
    
    <property>
      <name>dfs.hosts.exclude</name>
      <value>/etc/hadoop/conf/dfs.exclude</value>
    </property>
    
    <property>
      <name>dfs.http.policy</name>
      <value>HTTP_ONLY</value>
    </property>
    
    <property>
      <name>dfs.https.port</name>
      <value>50470</value>
    </property>
    
    <property>
      <name>dfs.internal.nameservices</name>
      <value>prodcluster</value>
    </property>
    
    <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/hadoop/hdfs/journal</value>
    </property>
    
    <property>
      <name>dfs.journalnode.http-address</name>
      <value>0.0.0.0:8480</value>
    </property>
    
    <property>
      <name>dfs.journalnode.https-address</name>
      <value>0.0.0.0:8481</value>
    </property>
    
    <property>
      <name>dfs.namenode.accesstime.precision</name>
      <value>0</value>
    </property>
    
    <property>
      <name>dfs.namenode.audit.log.async</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.namenode.avoid.read.stale.datanode</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.namenode.avoid.write.stale.datanode</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.namenode.checkpoint.dir</name>
      <value>/hdfs/hadoop/hdfs/namesecondary</value>
    </property>
    
    <property>
      <name>dfs.namenode.checkpoint.edits.dir</name>
      <value>${dfs.namenode.checkpoint.dir}</value>
    </property>
    
    <property>
      <name>dfs.namenode.checkpoint.period</name>
      <value>21600</value>
    </property>
    
    <property>
      <name>dfs.namenode.checkpoint.txns</name>
      <value>1000000</value>
    </property>
    
    <property>
      <name>dfs.namenode.fslock.fair</name>
      <value>false</value>
    </property>
    
    <property>
      <name>dfs.namenode.handler.count</name>
      <value>600</value>
    </property>
    
    <property>
      <name>dfs.namenode.http-address.prodcluster.nn1</name>
      <value>hdp-nn1.hostname:50070</value>
    </property>
    
    <property>
      <name>dfs.namenode.http-address.prodcluster.nn2</name>
      <value>hdp-nn2.hostname:50070</value>
    </property>
    
    <property>
      <name>dfs.namenode.https-address.prodcluster.nn1</name>
      <value>hdp-nn1.hostname:50470</value>
    </property>
    
    <property>
      <name>dfs.namenode.https-address.prodcluster.nn2</name>
      <value>hdp-nn2.hostname:50470</value>
    </property>
    
    <property>
      <name>dfs.namenode.inode.attributes.provider.class</name>
      <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
    </property>
    
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>/hdfs/hadoop/hdfs/namenode</value>
      <final>true</final>
    </property>
    
    <property>
      <name>dfs.namenode.name.dir.restore</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.namenode.rpc-address.prodcluster.nn1</name>
      <value>hdp-nn1.hostname:8020</value>
    </property>
    
    <property>
      <name>dfs.namenode.rpc-address.prodcluster.nn2</name>
      <value>hdp-nn2.hostname:8020</value>
    </property>
    
    <property>
      <name>dfs.namenode.safemode.threshold-pct</name>
      <value>0.99</value>
    </property>
    
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://hdp-dn3.hostname:8485;hdp-nn1.hostname:8485;hdp-nn2.hostname:8485/prodcluster</value>
    </property>
    
    <property>
      <name>dfs.namenode.stale.datanode.interval</name>
      <value>30000</value>
    </property>
    
    <property>
      <name>dfs.namenode.startup.delay.block.deletion.sec</name>
      <value>3600</value>
    </property>
    
    <property>
      <name>dfs.namenode.write.stale.datanode.ratio</name>
      <value>1.0f</value>
    </property>
    
    <property>
      <name>dfs.nameservices</name>
      <value>prodcluster</value>
    </property>
    
    <property>
      <name>dfs.permissions.enabled</name>
      <value>true</value>
    </property>
    
    <property>
      <name>dfs.permissions.superusergroup</name>
      <value>hdfs</value>
    </property>
    
    <property>
      <name>dfs.replication</name>
      <value>3</value>
    </property>
    
    <property>
      <name>dfs.replication.max</name>
      <value>50</value>
    </property>
    
    <property>
      <name>dfs.support.append</name>
      <value>true</value>
      <final>true</final>
    </property>
    
    <property>
      <name>dfs.webhdfs.enabled</name>
      <value>true</value>
      <final>true</final>
    </property>
    
    <property>
      <name>fs.permissions.umask-mode</name>
      <value>077</value>
    </property>
    
    <property>
      <name>nfs.exports.allowed.hosts</name>
      <value>* rw</value>
    </property>
    
    <property>
      <name>nfs.file.dump.dir</name>
      <value>/tmp/.hdfs-nfs</value>
    </property>
    
  </configuration>

cat /etc/ams-hbase/conf/hbase-env.sh

# Set environment variables here.
# The java implementation to use. Java 1.6+ required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/ams-hbase/conf}
# Extra Java CLASSPATH elements. Optional.
additional_cp=
if [  -n "$additional_cp" ];
then
  export HBASE_CLASSPATH=${HBASE_CLASSPATH}:$additional_cp
else
  export HBASE_CLASSPATH=${HBASE_CLASSPATH}
fi

# The maximum amount of heap to use for hbase shell.
export HBASE_SHELL_OPTS="-Xmx256m"
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp"
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS=" -Xms512m -Xmx512m -Xmn102m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HBASE_REGIONSERVER_OPTS=" -Xmn128m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms896m -Xmx896m"
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/var/log/ambari-metrics-collector
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
# use embedded native libs
_HADOOP_NATIVE_LIB="/usr/lib/ams-hbase/lib/hadoop-native/"
export HBASE_OPTS="$HBASE_OPTS -Djava.library.path=${_HADOOP_NATIVE_LIB}"
# Unset HADOOP_HOME to avoid importing HADOOP installed cluster related configs like: /usr/hdp/2.2.0.0-2041/hadoop/conf/
export HADOOP_HOME=/usr/lib/ams-hbase/
# Explicitly Setting HBASE_HOME for AMS HBase so that there is no conflict
export HBASE_HOME=/usr/lib/ams-hbase/
rpm -qa | grep ambari 
ambari-metrics-collector-2.4.0.1-1.x86_64
ambari-metrics-hadoop-sink-2.4.0.1-1.x86_64
ambari-agent-2.4.0.1-1.x86_64
ambari-infra-solr-client-2.4.0.1-1.x86_64
ambari-logsearch-logfeeder-2.4.0.1-1.x86_64
ambari-metrics-monitor-2.4.0.1-1.x86_64
ambari-metrics-grafana-2.4.0.1-1.x86_64
ambari-infra-solr-2.4.0.1-1.x86_64

avatar
Expert Contributor

Hi @Aravindan Vijayan

Yesterday suddenly Ambari Metrics has started working (and still works). The only thing I have changed yesterday - install Apache Atlas, which required restart almost all components, may be it helped. Thanks for your assistance!