<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: ambari metrics collector going down in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164868#M127235</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUN&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;What is the size of the cluster?&lt;/P&gt;&lt;P&gt;Can we have the following config items ? &lt;/P&gt;&lt;P&gt;/etc/ambari-metrics-collector/conf - ams-site.xml, ams-env.sh&lt;/P&gt;&lt;P&gt;/etc/ams-hbase/conf - hbase-site.xml, hbase-env.sh&lt;/P&gt;&lt;P&gt;Also, the response of http:&amp;lt;AMS_HOST&amp;gt;:6188/ws/v1/timeline/metrics/metadata&lt;/P&gt;</description>
    <pubDate>Wed, 21 Dec 2016 06:13:23 GMT</pubDate>
    <dc:creator>avijayan</dc:creator>
    <dc:date>2016-12-21T06:13:23Z</dc:date>
    <item>
      <title>ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164862#M127229</link>
      <description>&lt;P&gt;Ambari metrics collector is going down because of lack of thread pools. How to increase the thread pool size for ambari metrics hbase. we are running hbase for metrics in distributed mode. Collector gows down within 5 minutes after restart.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 00:47:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164862#M127229</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-12-21T00:47:55Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164863#M127230</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUN&lt;/A&gt; Can you please share the error that you see? I assume this is from the AMS log files. &lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 00:50:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164863#M127230</guid>
      <dc:creator>elserj</dc:creator>
      <dc:date>2016-12-21T00:50:07Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164864#M127231</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/223/jelser.html" nodeid="223"&gt;@Josh Elser&lt;/A&gt;, i have attached the metrics log. I do see some strange yarn related errors too in metrics log for the first time &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10543-metrics-log.txt"&gt;metrics-log.txt&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 01:17:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164864#M127231</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-12-21T01:17:17Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164865#M127232</link>
      <description>&lt;P&gt;Also i do see errors like this too&lt;/P&gt;&lt;P&gt;ERROR org.apache.hadoop.hbase.client.AsyncProcess: Internal AsyncProcess #1 error for METRIC_RECORD_MINUTE processing for local,61320,1481772798476
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@db14f9e rejected from java.util.concurrent.ThreadPoolExecutor@a8ef1a0[Shutting down, pool size = 10, active threads = 10, queued tasks = 324, completed tasks = 45]
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1256)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.findAllLocationsOrFail(AsyncProcess.java:940)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:857)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1186)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveGlobalFailure(AsyncProcess.java:1153)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1100(AsyncProcess.java:575)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:718)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1186)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveGlobalFailure(AsyncProcess.java:1153)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1100(AsyncProcess.java:575)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:718)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@db14f9e rejected from java.util.concurrent.ThreadPoolExecutor@a8ef1a0[Shutting down, pool size = 10, active threads = 10, queued tasks = 324, completed tasks = 45]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 01:20:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164865#M127232</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-12-21T01:20:22Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164866#M127233</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUN&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Basically there are 2 errors:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1). Address already in use (port conflict)&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;Caused by: java.net.BindException: Problem binding to [0.0.0.0:60200] java.net.BindException: Address already in use; For more details see:  &lt;A href="http://wiki.apache.org/hadoop/BindException" target="_blank"&gt;http://wiki.apache.org/hadoop/BindException&lt;/A&gt;&lt;/PRE&gt;&lt;P&gt;
For that please check which process is consuming that port and if there is a port conflict then either change the port or Kill the other process that is consuming that port.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2). For the second error , Looks like a Data Corruption. I will suggest clear old AMS data. 
&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data" target="_blank"&gt;https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;Caused by: java.io.InterruptedIOException: Interrupted calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00\x00METRIC_RECORD
        at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1769)
        at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1719)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1022)&lt;/PRE&gt;&lt;P&gt;- Shut down AMS and then Clear out the "/var/lib/ambari-metrics-collector" dir for fresh restart: &lt;/P&gt;&lt;P&gt;- From Ambari -&amp;gt; Ambari Metrics -&amp;gt; Config -&amp;gt; Advanced ams-hbase-site get the "hbase.rootdir" and "hbase-tmp" directory
- Delete or Move the hbase-tmp and hbase.rootdir directories to an archive folder &lt;/P&gt;&lt;P&gt;- Then Re-Started AMS. &lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 01:30:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164866#M127233</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2016-12-21T01:30:43Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164867#M127234</link>
      <description>&lt;P&gt;
	Hah, yes, it seems like you have a port conflict problem.&lt;/P&gt;&lt;P&gt;
	You could use a tool like netstat to find what process has already bound the port 60020, e.g. `sudo netstat -nape | fgrep 60020`. You can find the pid of the process which has that port bound. Once you identify the other process, you can determine if there is a port conflict which needs to be changed via configuration.&lt;/P&gt;&lt;P&gt;
	One important note is that 60020 is in the &lt;A href="https://en.wikipedia.org/wiki/Ephemeral_port"&gt;Ephemeral port range&lt;/A&gt; which means that there may be transient sockets binding that port. If you do not see any service bound on that port now, this is likely what happened. You can try to just restart the AMS in this case. This is the reason that HBase default ports moved from 600xx to 160xx in recent versions.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 02:14:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164867#M127234</guid>
      <dc:creator>elserj</dc:creator>
      <dc:date>2016-12-21T02:14:13Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164868#M127235</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUN&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;What is the size of the cluster?&lt;/P&gt;&lt;P&gt;Can we have the following config items ? &lt;/P&gt;&lt;P&gt;/etc/ambari-metrics-collector/conf - ams-site.xml, ams-env.sh&lt;/P&gt;&lt;P&gt;/etc/ams-hbase/conf - hbase-site.xml, hbase-env.sh&lt;/P&gt;&lt;P&gt;Also, the response of http:&amp;lt;AMS_HOST&amp;gt;:6188/ws/v1/timeline/metrics/metadata&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2016 06:13:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164868#M127235</guid>
      <dc:creator>avijayan</dc:creator>
      <dc:date>2016-12-21T06:13:23Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164869#M127236</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/290/avijayan.html" nodeid="290"&gt;@Aravindan Vijayan&lt;/A&gt;, Sorry for the delayed reply. our cluster size is 30 nodes.&lt;/P&gt;&lt;P&gt;I have attached the details you have asked for and this is the output of&lt;/P&gt;&lt;P&gt;http:&amp;lt;AMS_HOST&amp;gt;:6188/ws/v1/timeline/metrics/metadata&lt;/P&gt;&lt;P&gt;{"timestamp":0,"starttime":0,"metrics":{}}&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10834-hbase-site.xml"&gt;hbase-site.xml&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10835-ams-site.xml"&gt;ams-site.xml&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10836-ams-env.txt"&gt;ams-env.txt&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10837-ams-env.txt"&gt;ams-env.txt&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10838-hbase-env.txt"&gt;hbase-env.txt&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Do we need to increase any of the parameters for metrics&lt;/P&gt;</description>
      <pubDate>Wed, 28 Dec 2016 13:54:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164869#M127236</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-12-28T13:54:54Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164870#M127237</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/290/avijayan.html" nodeid="290"&gt;@Aravindan Vijayan&lt;/A&gt;, I cleared of the ambari metrics data and restarted metrics again. but collector went down again with the following error. Guess it is due to lack of resources. can you please point out the parameter that needs to be increased for our cluster configuration. I have given the cluster details in the previous message.  it is a 30 node cluster and we have 256 gb ram in 28 slave nodes. also i have attached the entire log after today's restart.&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/11093-ambari-metrics-collector.zip"&gt;ambari-metrics-collector.zip&lt;/A&gt;.&lt;/P&gt;&lt;PRE&gt;2017-01-04 02:22:52,065 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 4  actions to finish
2017-01-04 02:22:52,065 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 4  actions to finish
2017-01-04 02:22:52,065 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 4  actions to finish
2017-01-04 02:22:52,066 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 18  actions to finish
2017-01-04 02:22:52,067 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 1879  actions to finish
2017-01-04 02:22:53,877 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 6  actions to finish
2017-01-04 02:22:53,877 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 74  actions to finish
2017-01-04 02:22:53,877 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 121  actions to finish
2017-01-04 02:22:53,877 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 74  actions to finish
2017-01-04 02:22:53,877 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 6  actions to finish
2017-01-04 02:22:53,877 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 74  actions to finish
2017-01-04 02:22:53,880 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 43  actions to finish&lt;/PRE&gt;</description>
      <pubDate>Wed, 04 Jan 2017 16:51:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164870#M127237</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2017-01-04T16:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164871#M127238</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUN&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Based on the logs, it seems one or more components are flooding the system with too many metrics. It could be the cluster HBase Service. &lt;/P&gt;&lt;P&gt;Can you check if the last 2 lines in the files mentioned in &lt;A href="https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/_enabling_hbase_region_and_table_metrics.html" target="_blank"&gt;https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/_enabling_hbase_region_and_table_metrics.html&lt;/A&gt; are &lt;STRONG&gt;not&lt;/STRONG&gt; commented out? &lt;/P&gt;&lt;P&gt;The last 2 lines should look like this (and should not be commented out).&lt;/P&gt;&lt;PRE&gt;*.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter

hbase.*.source.filter.exclude=*Regions*&lt;/PRE&gt;&lt;P&gt;Restart HBase Service after these changes. &lt;/P&gt;&lt;P&gt;Also, for a 30 node cluster, AMS should work fine with embedded mode, writing data to local disk. Your cluster AMS is configured to distributed mode where AMS HBase writes to cluster HDFS. Do you have a local datanode on Metrics collector host? &lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2017 07:11:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164871#M127238</guid>
      <dc:creator>avijayan</dc:creator>
      <dc:date>2017-01-05T07:11:22Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164872#M127239</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/290/avijayan.html" nodeid="290"&gt;@Aravindan Vijayan&lt;/A&gt;, these 2 lines are not present in any of the 2 files mentiones in the url you gave. That means this is equivalent to getting commented out. so metrics at region level too are getting flooded. YEs the metrics collector host is cohosted with a datanode. but we are planning to move it to a dedicated admin host.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2017 13:47:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164872#M127239</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2017-01-05T13:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: ambari metrics collector going down</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164873#M127240</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUN&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Here the instruction is to disable (exclude) HBase per region metrics to avoid data flooding. &lt;/P&gt;&lt;P&gt;That can be done by explicitly adding the following lines to the end of the file:&lt;/P&gt;&lt;PRE&gt;*.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter
hbase.*.source.filter.exclude=*Regions*&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2017 14:04:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ambari-metrics-collector-going-down/m-p/164873#M127240</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2017-01-05T14:04:28Z</dc:date>
    </item>
  </channel>
</rss>

