- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NameNode Last Checkpoint script alert definition does not trigger based on uncommitted transactions
- Labels:
-
Apache Hadoop
Created ‎05-31-2016 11:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI Team
The 'NameNode Last Checkpoint' alert description says "This service-level alert will trigger if the last time that the NameNode performed a checkpoint was too long ago. It will also trigger if the number of uncommitted transactions is beyond a certain threshold."
I got alert on HA, Saying checkpoint happened too long ago.. How to solve this issue.
HDP : 2.3
Created ‎05-31-2016 04:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Restarting the Namenode will fix your issue, Before restarting run below commands
hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace hdfs dfsadmin -safemode leave
Now restart NN, JN.
Created ‎03-09-2017 07:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's seems not really resolve problem. It's still show up next time. What's the final way?
Created ‎06-01-2016 11:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is this a test system? Are the values for the following modified from default
- dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints
- dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
You could go to the Namenode current folder and check when was the last fsimage created. Was the cluster down for long? You may want to review the above two parameters and check the timestamps in the current folder and find why automatic checkpointing is not happening.
You could use the steps provided by @Sri Bandaru to do a checkpoint manually if you could get downtime.
Created ‎11-14-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@vpoornalingam I have checked the above two values: value of dfs.namenode.checkpoint.period is set to 6 hours. Does this creates the above mentioned alerts?
Created ‎04-17-2019 03:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Restarting the Standby NameNode did not resolve the issue. Then when trying to restart the Active NameNode Ambari told me to SSH into my Active NameNode server and:
- Enter Safe Mode
sudo su hdfs -l -c 'hdfs dfsadmin -safemode enter'
Safe mode is ON in xx.xxx.xxx.xxx:8020
Safe mode is ON in xx.xxx.xxx.xxx:8020
- The save Namespace
root@xxxxx:~# sudo su hdfs -l -c 'hdfs dfsadmin -saveNamespace'
Save namespace successful for xx.xxx.xxx.xxx:8020
Save namespace successful for xx.xxx.xxx.xxx:8020
After that restarting both NameNodes cleared the Alerts.
Regards.

- « Previous
-
- 1
- 2
- Next »