Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NameNode Last Checkpoint script alert definition does not trigger based on uncommitted transactions

avatar
Rising Star

HI Team

The 'NameNode Last Checkpoint' alert description says "This service-level alert will trigger if the last time that the NameNode performed a checkpoint was too long ago. It will also trigger if the number of uncommitted transactions is beyond a certain threshold."

I got alert on HA, Saying checkpoint happened too long ago.. How to solve this issue.

HDP : 2.3

1 ACCEPTED SOLUTION

avatar

Restarting the Namenode will fix your issue, Before restarting run below commands

hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave

Now restart NN, JN.

View solution in original post

13 REPLIES 13

avatar
Rising Star

Do i need to run any commands? Manually to update the checkpoint .

avatar
Rising Star

Can some one help ????

avatar
@suresh krish

Which version of Ambari are you using? it seems your issue is fixed in Ambari 2.4

details are here:

https://issues.apache.org/jira/browse/AMBARI-15953

avatar
Rising Star

Ambari 2.1

avatar
Rising Star

What is the actual cause. Is it really not checkpointing the edits or its just a ignorable alert

avatar

I'm not sure if we can ignore this alerts but it seems issue with Ambari Alert script and they fixed it in 2.4. I would recommend to check with HDP while creating support ticket to get confirmation.

avatar

Restarting the Namenode will fix your issue, Before restarting run below commands

hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave

Now restart NN, JN.

avatar
Rising Star

@SBandaru

I have performed the above steps but after sometime these alerts are coming again and again. So can you please suggest something to fix this issue permanentaly.

avatar
Contributor

@Yukti Agrawal , I have met just the same problem, after performing the above steps ,then after sometime these alerts are coming again and again , did you solve this problem ? I set dfs.namenode.checkpoint.period to 1 hour , but it seems it did not work, since I checked the fsimage file which is not generated automatically per hour , thanks