Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Checkpoint Status on name node

SOLVED Go to solution

Checkpoint Status on name node

New Contributor

I keep getting the follwoing health error message:

 

The filesystem checkpoint is 22 hour(s), 40 minute(s) old. This is 2,267.75% of the configured checkpoint period of 1 hour(s). Critical threshold: 400.00%. 10,775 transactions have occurred since the last filesystem checkpoint. This is 1.08% of the configured checkpoint transaction target of 1,000,000.

 

What is causing this and how can I get it to stop.

 

Logs:

 

Number of transactions: 8 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 6 SyncTimes(ms): 132 
Number of transactions: 8 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 7 SyncTimes(ms): 155 
Finalizing edits file /dfs/nn/current/edits_inprogress_0000000000000021523 -> /dfs/nn/current/edits_0000000000000021523-0000000000000021530
Starting log segment at 21531
Rescanning after 30000 milliseconds
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
list corrupt file blocks returned: 0
list corrupt file blocks returned: 0
BLOCK* allocateBlock: /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2014_10_09-07_29_47. BP-941526827-192.168.0.1-1412692043930 blk_1073744503_3679{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-a7270ad4-959d-4756-b731-83457af7c6a3:NORMAL|RBW]]}
BLOCK* addStoredBlock: blockMap updated: 192.168.0.102:50010 is added to blk_1073744503_3679{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-eaca52a9-2713-4901-b978-e331c17800fc:NORMAL|RBW]]} size 0
DIR* completeFile: /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2014_10_09-07_29_47 is closed by DFSClient_NONMAPREDUCE_592472068_72
BLOCK* addToInvalidates: blk_1073744503_3679 192.168.0.102:50010 
BLOCK* BlockManager: ask 192.168.0.102:50010 to delete [blk_1073744503_3679]
Rescanning after 30001 milliseconds
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Checkpoint Status on name node

Contributor

I found the fix for my issue.

 

As after format of namenode, checkpointing on snn was not happening bcoz of old namespace and blockpoolID on VERSION file.

 

After deleting the files under /data/dfs/snn. I restart the namenode and snn, later found it working fine.

 

 

13 REPLIES 13

Re: Checkpoint Status on name node

New Contributor

No One knows?????

Re: Checkpoint Status on name node

You could get this error if the secondary or standby namenode is not performing checkpointing correctly. Please verify the health of the following roles:
- Standby Namenode (if HDFS-HA is enabled)
- Secondary Namenode (if HDFS-HA is *not* enabled)
Regards,
Gautam Gopalakrishnan

Re: Checkpoint Status on name node

New Contributor

This is what it states.

 

The filesystem checkpoint is 2 day(s), 59 minute(s) old. This is 4,898.53% of the configured checkpoint period of 1 hour(s). Critical threshold: 400.00%. 23,261 transactions have occurred since the last filesystem checkpoint. This is 2.33% of the configured checkpoint transaction target of 1,000,000.

Re: Checkpoint Status on name node

Contributor

It seems like HA settings are enabled.

 

Please check if HDFS>Configuration>"Filesystem Checkpoint Age Monitoring Thresholds"  is specified.  If specified then change it to never as shown below. Save the settings you will not get the message again.

 

 

HA.png

Re: Checkpoint Status on name node

Master Guru

Please never disable that check. Checkpoints are very essential for the HDFS operation, and you do not want to be in a position with checkpoints failing for a technical reason and you never getting notified on that.

 

Instead, look at your Standby or Secondary NN to figure out what the error is, and/or seek help with that identified information.

Re: Checkpoint Status on name node

Contributor

I found the fix for my issue.

 

As after format of namenode, checkpointing on snn was not happening bcoz of old namespace and blockpoolID on VERSION file.

 

After deleting the files under /data/dfs/snn. I restart the namenode and snn, later found it working fine.

 

 

Re: Checkpoint Status on name node

Contributor

what is it the files under snn ?

: )

Highlighted

Re: Checkpoint Status on name node

Contributor
delete the dir is unsafe.
After I restart the hdfs cluster,the error message gone away.

Re: Checkpoint Status on name node

Hi Harsh,

 

I am getting below exception on the namenode though it doesnt affect my services. But once there wasnt an automatic failover though it was enabled. I found out following error logs :

.

.

 

Forwardable Ticket true
Forwarded Ticket false
Proxiable Ticket false
Proxy Ticket false
Postdated Ticket false
Renewable Ticket false

Initial Ticket false
Auth Time = Wed Feb 03 13:49:37 CET 2016
Start Time = Wed Feb 03 13:49:40 CET 2016
End Time = Wed Feb 03 23:49:37 CET 2016
Renew Till = null
Client Addresses  Null

2016-02-03 14:49:49,093 ERROR org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Exception in doCheckpoint
java.io.IOException: Exception during image upload: java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:221)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:353)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$700(StandbyCheckpointer.java:260)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:280)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:276)
Caused by: java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))
        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:298)
        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:222)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:207)

..

.

.

.

.