Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

checkpoint is not occuring

Solved Go to solution

checkpoint is not occuring

Dear all,  I REcently enabled  HA With my namenode.
i started to see issue with my CHECKPOINT process, Means, CHeckPOInt did not occur for past 5 hours.

Here go my observation. Have you seen this case before. or am i hitting any BUG?

Kind share your advice to crack this issue out ... 
 
 As per checkpoint process,
When the updated FSIMAGE get downloaded to "NAMENODE" from "STANDBY NAMENODE", 
The "FSIMAGE.ckpt_txid" must be renamed to "FSIMAGE_txid" But It's not happening in my case.
 
I did not see any file named with "FSIMAGE_txid" in my namenode , All are looks like  "FSIMAGE.ckpt_txid".
So I just compared both  "FSIMAGE.ckpt_txid" & "FSIMAGE_txid" ,Both got same checksum value.
 
FSIMAGE.ckpt_txid is from NAMENODE
FSIMAGE_txid is from SECONDARYNAMENODE
 
namenode:
=========
root@namenode:/mnt/sdb/name/current# cksum fsimage.ckpt_0000000000604392126
3708522794 2148716968 fsimage.ckpt_0000000000604392126
 
secondary-namenode:
================
root@secondary-namenode:/mnt/sdd/name/current# cksum fsimage_0000000000604392126
3708522794 2148716968 fsimage_0000000000604392126
 
NOTE: I did not see twork issueany ne, i am able to download the fsimage using "wget" Command.
 
i am using cdh 4.1.3 & Cloudera Enterprise 4.6.3 
 
Best Regards,
BOMmuraj
1 ACCEPTED SOLUTION

Accepted Solutions

Re: checkpoint is not occuring

Thank you Harsh for your email !!! i was hitting below issue, I increased this "dfs.image.transfer.timeout" and it fixed the issue. https://issues.apache.org/jira/browse/HDFS-4301 Checkpoint was working fine but the issue started when my fsimage size reached 2.1GB. Best Regards, Bommuraj
2 REPLIES 2
Highlighted

Re: checkpoint is not occuring

Master Guru
It is difficult to say if you are hitting a bug without looking at relevant Checkpointer placed entries in the StandbyNameNode (SBN) logs.

There may be issues with transferring the file between the SBN and the NN, probably cause of timeouts or otherwise.

Re: checkpoint is not occuring

Thank you Harsh for your email !!! i was hitting below issue, I increased this "dfs.image.transfer.timeout" and it fixed the issue. https://issues.apache.org/jira/browse/HDFS-4301 Checkpoint was working fine but the issue started when my fsimage size reached 2.1GB. Best Regards, Bommuraj
Don't have an account?
Coming from Hortonworks? Activate your account here