Created 05-31-2016 11:47 AM
HI Team
The 'NameNode Last Checkpoint' alert description says "This service-level alert will trigger if the last time that the NameNode performed a checkpoint was too long ago. It will also trigger if the number of uncommitted transactions is beyond a certain threshold."
I got alert on HA, Saying checkpoint happened too long ago.. How to solve this issue.
HDP : 2.3
Created 05-31-2016 04:49 PM
Restarting the Namenode will fix your issue, Before restarting run below commands
hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace hdfs dfsadmin -safemode leave
Now restart NN, JN.
Created 05-31-2016 12:43 PM
Do i need to run any commands? Manually to update the checkpoint .
Created 05-31-2016 03:16 PM
Can some one help ????
Created 05-31-2016 03:31 PM
Which version of Ambari are you using? it seems your issue is fixed in Ambari 2.4
details are here:
Created 05-31-2016 04:35 PM
Ambari 2.1
Created 05-31-2016 04:36 PM
What is the actual cause. Is it really not checkpointing the edits or its just a ignorable alert
Created 05-31-2016 05:08 PM
I'm not sure if we can ignore this alerts but it seems issue with Ambari Alert script and they fixed it in 2.4. I would recommend to check with HDP while creating support ticket to get confirmation.
Created 05-31-2016 04:49 PM
Restarting the Namenode will fix your issue, Before restarting run below commands
hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace hdfs dfsadmin -safemode leave
Now restart NN, JN.
Created 11-14-2016 09:50 AM
I have performed the above steps but after sometime these alerts are coming again and again. So can you please suggest something to fix this issue permanentaly.
Created 01-07-2017 04:53 AM
@Yukti Agrawal , I have met just the same problem, after performing the above steps ,then after sometime these alerts are coming again and again , did you solve this problem ? I set dfs.namenode.checkpoint.period to 1 hour , but it seems it did not work, since I checked the fsimage file which is not generated automatically per hour , thanks
Created 03-09-2017 07:46 AM
It's seems not really resolve problem. It's still show up next time. What's the final way?
Created 06-01-2016 11:58 AM
Is this a test system? Are the values for the following modified from default
You could go to the Namenode current folder and check when was the last fsimage created. Was the cluster down for long? You may want to review the above two parameters and check the timestamps in the current folder and find why automatic checkpointing is not happening.
You could use the steps provided by @Sri Bandaru to do a checkpoint manually if you could get downtime.
Created 11-14-2016 09:47 AM
@vpoornalingam I have checked the above two values: value of dfs.namenode.checkpoint.period is set to 6 hours. Does this creates the above mentioned alerts?
Created 04-17-2019 03:23 PM
Hello,
Restarting the Standby NameNode did not resolve the issue. Then when trying to restart the Active NameNode Ambari told me to SSH into my Active NameNode server and:
sudo su hdfs -l -c 'hdfs dfsadmin -safemode enter'
Safe mode is ON in xx.xxx.xxx.xxx:8020
Safe mode is ON in xx.xxx.xxx.xxx:8020
root@xxxxx:~# sudo su hdfs -l -c 'hdfs dfsadmin -saveNamespace'
Save namespace successful for xx.xxx.xxx.xxx:8020
Save namespace successful for xx.xxx.xxx.xxx:8020
After that restarting both NameNodes cleared the Alerts.
Regards.