Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
checkpoint is not occuring
Labels:
- Labels:
-
HDFS
Explorer
Created 06-23-2014 05:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all, I REcently enabled HA With my namenode.
i started to see issue with my CHECKPOINT process, Means, CHeckPOInt did not occur for past 5 hours.
Here go my observation. Have you seen this case before. or am i hitting any BUG?
Kind share your advice to crack this issue out ...
As per checkpoint process,
When the updated FSIMAGE get downloaded to "NAMENODE" from "STANDBY NAMENODE",
The "FSIMAGE.ckpt_txid" must be renamed to "FSIMAGE_txid" But It's not happening in my case.
I did not see any file named with "FSIMAGE_txid" in my namenode , All are looks like "FSIMAGE.ckpt_txid".
So I just compared both "FSIMAGE.ckpt_txid" & "FSIMAGE_txid" ,Both got same checksum value.
FSIMAGE.ckpt_txid is from NAMENODE
FSIMAGE_txid is from SECONDARYNAMENODE
namenode:
=========
=========
root@namenode:/mnt/sdb/name/current# cksum fsimage.ckpt_0000000000604392126
3708522794 2148716968 fsimage.ckpt_0000000000604392126
secondary-namenode:
================
================
root@secondary-namenode:/mnt/sdd/name/current# cksum fsimage_0000000000604392126
3708522794 2148716968 fsimage_0000000000604392126
NOTE: I did not see twork issueany ne, i am able to download the fsimage using "wget" Command.
i am using cdh 4.1.3 & Cloudera Enterprise 4.6.3
Best Regards,
BOMmuraj
BOMmuraj
1 ACCEPTED SOLUTION
Explorer
Created 07-21-2014 10:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Harsh for your email !!! i was hitting below issue, I increased this "dfs.image.transfer.timeout" and it fixed the issue. https://issues.apache.org/jira/browse/HDFS-4301 Checkpoint was working fine but the issue started when my fsimage size reached 2.1GB. Best Regards, Bommuraj
2 REPLIES 2
Mentor
Created 07-19-2014 10:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is difficult to say if you are hitting a bug without looking at relevant Checkpointer placed entries in the StandbyNameNode (SBN) logs.
There may be issues with transferring the file between the SBN and the NN, probably cause of timeouts or otherwise.
There may be issues with transferring the file between the SBN and the NN, probably cause of timeouts or otherwise.
Explorer
Created 07-21-2014 10:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Harsh for your email !!! i was hitting below issue, I increased this "dfs.image.transfer.timeout" and it fixed the issue. https://issues.apache.org/jira/browse/HDFS-4301 Checkpoint was working fine but the issue started when my fsimage size reached 2.1GB. Best Regards, Bommuraj
