Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Namode last Checkoint alert

Solved Go to solution

Namode last Checkoint alert

Expert Contributor

Hi,

I'm looking at checkpoint alert in NN ha environment. Where i have last checkpoint was completed 22 hours ago.

I'm doing checkpoint manually by command line. how can i do it automatically and how can we ignore these alerts about checkpoint in UI.

Last Checkpoint: [22 hours, 19 minutes, 45507 transactions]

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Namode last Checkoint alert

Expert Contributor

It's working now. Check pointing period is 6-7 hours. During that period, NN was down.

Thanks

View solution in original post

5 REPLIES 5
Highlighted

Re: Namode last Checkoint alert

Super Mentor

@Vinay K

Ambari basically relies on the NameNode JMX call to find out the "LastCheckpointTime"

Something like this: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2....

# curl "http://hdfcluster1.example.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem" | grep 'LastCheckpointTime'

.

For example if the above JMX call returns the epoch time as '1521523640579' then please convert it to the human readable time to find out what is correct time when the LastCheckPoint happened on nameNode.

# date -d @1521523640

NOTE-1: if your Ambari Cluster Hosts are not time sync then it might happen that the last checkpoint computation might go wrong.

NOTE-2: Every cluster node (including Ambari Server Host) should be able to resolve the NameNode JMX url. Else if the call will be made from any particular host where the alert is executed then it might not be able to make the jmx call to NN and it might give unknown results.

Highlighted

Re: Namode last Checkoint alert

Expert Contributor

@Jay Kumar SenSharma

Below is the output which i found

[root@slave0 centos]#curl "http://slave1.dl.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem" | grep 'LastCheckpointTime'

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1427 0 1427 0 0 183k 0 --:--:-- --:--:-- --:--:-- 199k "LastCheckpointTime" : 1521460953000,

[root@slave0 centos]# date -d @1521460953000

Fri Mar 7 00:00:00 IST 50183

It may be automatically checkpoint is not happening, While time is sync between servers.

Highlighted

Re: Namode last Checkoint alert

Super Mentor

@Vinay K

In your epoch time command please remove 3 last digith to get accurate date:

# date -d @1521460953
Mon Mar 19 12:02:33 UTC 2018

.

Highlighted

Re: Namode last Checkoint alert

Super Mentor

@Vinay K

So if your NameNode shows the LastCheckpoint time is around "Mon Mar 19 12:02:33 UTC 2018" then ambari might be showing right alert "Last Checkpoint: [22 hours, 19 minutes, 45507 transactions]"

So you should check from NameNode side if the check pointing is not happening on regular interval. Also please check the following property value and the NameNoide log to see any check pointing related warning / errors.

dfs.namenode.checkpoint.period

Specifies the number of seconds between two periodic checkpoints.

Highlighted

Re: Namode last Checkoint alert

Expert Contributor

It's working now. Check pointing period is 6-7 hours. During that period, NN was down.

Thanks

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here