Posts: 35
Registered: ‎11-22-2013
Accepted Solution

checkpoint is not occuring

Dear all,  I REcently enabled  HA With my namenode.
i started to see issue with my CHECKPOINT process, Means, CHeckPOInt did not occur for past 5 hours.

Here go my observation. Have you seen this case before. or am i hitting any BUG?

Kind share your advice to crack this issue out ... 
 As per checkpoint process,
When the updated FSIMAGE get downloaded to "NAMENODE" from "STANDBY NAMENODE", 
The "FSIMAGE.ckpt_txid" must be renamed to "FSIMAGE_txid" But It's not happening in my case.
I did not see any file named with "FSIMAGE_txid" in my namenode , All are looks like  "FSIMAGE.ckpt_txid".
So I just compared both  "FSIMAGE.ckpt_txid" & "FSIMAGE_txid" ,Both got same checksum value.
FSIMAGE.ckpt_txid is from NAMENODE
root@namenode:/mnt/sdb/name/current# cksum fsimage.ckpt_0000000000604392126
3708522794 2148716968 fsimage.ckpt_0000000000604392126
root@secondary-namenode:/mnt/sdd/name/current# cksum fsimage_0000000000604392126
3708522794 2148716968 fsimage_0000000000604392126
NOTE: I did not see twork issueany ne, i am able to download the fsimage using "wget" Command.
i am using cdh 4.1.3 & Cloudera Enterprise 4.6.3 
Best Regards,
Posts: 1,892
Kudos: 431
Solutions: 302
Registered: ‎07-31-2013

Re: checkpoint is not occuring

It is difficult to say if you are hitting a bug without looking at relevant Checkpointer placed entries in the StandbyNameNode (SBN) logs.

There may be issues with transferring the file between the SBN and the NN, probably cause of timeouts or otherwise.
Posts: 35
Registered: ‎11-22-2013

Re: checkpoint is not occuring

Thank you Harsh for your email !!! i was hitting below issue, I increased this "dfs.image.transfer.timeout" and it fixed the issue. Checkpoint was working fine but the issue started when my fsimage size reached 2.1GB. Best Regards, Bommuraj