Support Questions

Find answers, ask questions, and share your expertise

Namode last Checkoint alert

avatar
Rising Star

Hi,

I'm looking at checkpoint alert in NN ha environment. Where i have last checkpoint was completed 22 hours ago.

I'm doing checkpoint manually by command line. how can i do it automatically and how can we ignore these alerts about checkpoint in UI.

Last Checkpoint: [22 hours, 19 minutes, 45507 transactions]

1 ACCEPTED SOLUTION

avatar
Rising Star

It's working now. Check pointing period is 6-7 hours. During that period, NN was down.

Thanks

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Vinay K

Ambari basically relies on the NameNode JMX call to find out the "LastCheckpointTime"

Something like this: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2....

# curl "http://hdfcluster1.example.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem" | grep 'LastCheckpointTime'

.

For example if the above JMX call returns the epoch time as '1521523640579' then please convert it to the human readable time to find out what is correct time when the LastCheckPoint happened on nameNode.

# date -d @1521523640

NOTE-1: if your Ambari Cluster Hosts are not time sync then it might happen that the last checkpoint computation might go wrong.

NOTE-2: Every cluster node (including Ambari Server Host) should be able to resolve the NameNode JMX url. Else if the call will be made from any particular host where the alert is executed then it might not be able to make the jmx call to NN and it might give unknown results.

avatar
Rising Star

@Jay Kumar SenSharma

Below is the output which i found

[root@slave0 centos]#curl "http://slave1.dl.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem" | grep 'LastCheckpointTime'

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1427 0 1427 0 0 183k 0 --:--:-- --:--:-- --:--:-- 199k "LastCheckpointTime" : 1521460953000,

[root@slave0 centos]# date -d @1521460953000

Fri Mar 7 00:00:00 IST 50183

It may be automatically checkpoint is not happening, While time is sync between servers.

avatar
Master Mentor

@Vinay K

In your epoch time command please remove 3 last digith to get accurate date:

# date -d @1521460953
Mon Mar 19 12:02:33 UTC 2018

.

avatar
Master Mentor

@Vinay K

So if your NameNode shows the LastCheckpoint time is around "Mon Mar 19 12:02:33 UTC 2018" then ambari might be showing right alert "Last Checkpoint: [22 hours, 19 minutes, 45507 transactions]"

So you should check from NameNode side if the check pointing is not happening on regular interval. Also please check the following property value and the NameNoide log to see any check pointing related warning / errors.

dfs.namenode.checkpoint.period

Specifies the number of seconds between two periodic checkpoints.

avatar
Rising Star

It's working now. Check pointing period is 6-7 hours. During that period, NN was down.

Thanks