<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Namenode bad health and checkpoint status issue in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367653#M239960</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103713"&gt;@BrianChan&lt;/a&gt;&amp;nbsp;Both the alerts are related. The checkpointing is done by the Standby Namenode and if it's not functioning properly, then the checkpoint process is not done and you will see those alerts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can go through the logs of the Standby Namenode and check why the checkpoint thread is stopped. Maybe the Standby Namenode is down? So you may want to restart the Standby Namenode to fix this.&lt;/P&gt;</description>
    <pubDate>Tue, 04 Apr 2023 05:51:31 GMT</pubDate>
    <dc:creator>rki_</dc:creator>
    <dc:date>2023-04-04T05:51:31Z</dc:date>
    <item>
      <title>Namenode bad health and checkpoint status issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367531#M239929</link>
      <description>&lt;P&gt;Hi all, I have HDFS service running on my CDP 7.1.8 private cloud base cluster with Kerberos enabled.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Recently, I got two issues with my HDFS NameNode, here is the screen capture:&lt;/P&gt;&lt;P&gt;The first one&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="namenode bad health.PNG" style="width: 991px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37156i1D7C0B802A69B8C3/image-size/large?v=v2&amp;amp;px=999" role="button" title="namenode bad health.PNG" alt="namenode bad health.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The second one:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Namenode checkpoint stauts issue.PNG" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37157i6E6B8B6838C1DCA7/image-size/large?v=v2&amp;amp;px=999" role="button" title="Namenode checkpoint stauts issue.PNG" alt="Namenode checkpoint stauts issue.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;When looking into the role log, it shows&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="role log.PNG" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37159iD647338E98F38937/image-size/large?v=v2&amp;amp;px=999" role="button" title="role log.PNG" alt="role log.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could anyone point out the root cause and the solution for this issue for me please? Thanks in advance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know if I need to provide more information.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 07:16:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367531#M239929</guid>
      <dc:creator>BrianChan</dc:creator>
      <dc:date>2026-04-21T07:16:41Z</dc:date>
    </item>
    <item>
      <title>Re: Namenode bad health and checkpoint status issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367653#M239960</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103713"&gt;@BrianChan&lt;/a&gt;&amp;nbsp;Both the alerts are related. The checkpointing is done by the Standby Namenode and if it's not functioning properly, then the checkpoint process is not done and you will see those alerts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can go through the logs of the Standby Namenode and check why the checkpoint thread is stopped. Maybe the Standby Namenode is down? So you may want to restart the Standby Namenode to fix this.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Apr 2023 05:51:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367653#M239960</guid>
      <dc:creator>rki_</dc:creator>
      <dc:date>2023-04-04T05:51:31Z</dc:date>
    </item>
    <item>
      <title>Re: Namenode bad health and checkpoint status issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367961#M240008</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103713"&gt;@BrianChan&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You will need to manually perform the checkpoint on the faulty node. If the standby NameNode is faulty for a long time, generated edit log will accumulate. In this case, this will cause the HDFS or active NN to take a long time to restart and could even fail to restart because if the HDFS or active NameNode is restarted, the active NameNode reads a large amount of unmerged editlog.&lt;BR /&gt;&lt;BR /&gt;Is your NN setup active/standby?&lt;BR /&gt;Fr the below steps you could as well use CM UI to perfom the tasks&lt;/P&gt;&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;Quickest solution 1&lt;/STRONG&gt;&lt;/U&gt;&lt;BR /&gt;I have had occasions when a simple rolling restart of the Zk's would resolve that biut I see the checkpoint lag goes to &amp;gt; 2 days&lt;/P&gt;&lt;P&gt;&lt;U&gt;Solution 2&lt;/U&gt;&lt;BR /&gt;Check the most up to date on both NN by comparing the dates of files in the directory.&lt;/P&gt;&lt;LI-SPOILER&gt;$ ls -lrt /dfs/nn/current/&lt;/LI-SPOILER&gt;&lt;P&gt;On the Active NN with the latest editlogs as hdfs user&lt;/P&gt;&lt;LI-SPOILER&gt;$ hdfs dfsadmin -safemode enter&lt;/LI-SPOILER&gt; &lt;LI-SPOILER&gt;$ hdfs dfsadmin -saveNamespace&lt;/LI-SPOILER&gt;&lt;P&gt;Check whether the latest generated fsimage timestamp is the current time. If yes, the combination is executed correctly and is complete.&lt;/P&gt;&lt;LI-SPOILER&gt;$ hdfs dfsadmin -safemode leave&lt;/LI-SPOILER&gt;&lt;P&gt;Before restarting the HDFS or active NameNode, perform a checkpoint manually to merge the metadata of the active NameNode.&lt;BR /&gt;The restart the standby the newly generated files should now automatically be shipped and synced this could take a while &amp;lt; 5 minutes and your NN should all be green&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2023 21:15:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Namenode-bad-health-and-checkpoint-status-issue/m-p/367961#M240008</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2023-04-06T21:15:42Z</dc:date>
    </item>
  </channel>
</rss>

