<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Node in maintenance mode throws stale alert from management nodes in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164089#M53635</link>
    <description>&lt;P&gt;Why is it that in Ambari (2.4.1.0), and alert is thrown for a node as being stale (I just decommissioned it, it is shutdown now), when the server is in maintenance mode? While the node is decommissioning, there is no alert. Is there a way to temporarily take the node out equation if it is in maintenance mode, so the management nodes do not complain about the stale node?&lt;/P&gt;</description>
    <pubDate>Tue, 07 Feb 2017 21:42:09 GMT</pubDate>
    <dc:creator>mtdeguzis</dc:creator>
    <dc:date>2017-02-07T21:42:09Z</dc:date>
    <item>
      <title>Node in maintenance mode throws stale alert from management nodes</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164089#M53635</link>
      <description>&lt;P&gt;Why is it that in Ambari (2.4.1.0), and alert is thrown for a node as being stale (I just decommissioned it, it is shutdown now), when the server is in maintenance mode? While the node is decommissioning, there is no alert. Is there a way to temporarily take the node out equation if it is in maintenance mode, so the management nodes do not complain about the stale node?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 21:42:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164089#M53635</guid>
      <dc:creator>mtdeguzis</dc:creator>
      <dc:date>2017-02-07T21:42:09Z</dc:date>
    </item>
    <item>
      <title>Re: Node in maintenance mode throws stale alert from management nodes</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164090#M53636</link>
      <description>&lt;P&gt;Can you specify which alert is being triggered? Most likely, it's an alert based on a master service's metrics. For example, if you decommission a DataNode and you place that DataNode into Maintenance Mode, then Ambari won't file alerts for it. However, if the NameNode broadcasts a metric that indicates there's a problem with the liveliness of the DataNodes, then Ambari will display that alert. &lt;/P&gt;&lt;P&gt;This is because the master service is running on a separate machine and doesn't care about the maintenance mode of he affected slave. Each service is different - some services understand that a decommission means that the node shouldn't be stale and some still create the metric to indicate staleness for a short period of time.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 21:47:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164090#M53636</guid>
      <dc:creator>jonathanhurley</dc:creator>
      <dc:date>2017-02-07T21:47:30Z</dc:date>
    </item>
    <item>
      <title>Re: Node in maintenance mode throws stale alert from management nodes</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164091#M53637</link>
      <description>&lt;P&gt;The alert is "DataNode Health Summary, DataNode Health: [Live=29, Stale=0, Dead=1]". I gues then you're right, no way to avoid such a situation. Thank you for the explanation.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 21:51:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164091#M53637</guid>
      <dc:creator>mtdeguzis</dc:creator>
      <dc:date>2017-02-07T21:51:45Z</dc:date>
    </item>
    <item>
      <title>Re: Node in maintenance mode throws stale alert from management nodes</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164092#M53638</link>
      <description>&lt;P&gt;I think this goes back to the whole "dead is bad" theory. If I recall correctly, there was a metric Ambari was monitoring once on HBase - it was for "Dead RegionServers". We incorrectly assumed that "dead" was "bad". Because of this, while decommissioning a RegionServer, alerts would trigger (and not go away for a long time).&lt;/P&gt;&lt;P&gt;In the end, it was determined that this metric wasn't really something which needed alerting on. &lt;/P&gt;&lt;P&gt;HDFS is a little different - I believe that a DataNode is marked as stale if it hasn't reported in within 30 seconds and marked as dead if it hasn't reported within 1 minute. The problem here is that action is taken by the NameNode in this case - it will begin replicating blocks when it believes a DataNode is dead. So, we alert on it since it's something that is actively causing changes in the cluster data.&lt;/P&gt;&lt;P&gt;The NameNode actually has metrics for differentiating "dead" vs "decommissiong dead":&lt;/P&gt;&lt;PRE&gt;"NumLiveDataNodes": 3,
"NumDeadDataNodes": 1,
"NumDecomLiveDataNodes": 0,
"NumDecomDeadDataNodes": 1,&lt;/PRE&gt;&lt;P&gt;In the above example, Ambari won't worry about dead nodes which are marked as known decommissioning, but we will worry about this which are unexpected. &lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 22:03:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-in-maintenance-mode-throws-stale-alert-from-management/m-p/164092#M53638</guid>
      <dc:creator>jonathanhurley</dc:creator>
      <dc:date>2017-02-07T22:03:46Z</dc:date>
    </item>
  </channel>
</rss>

