Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Checkpoint node configuration parameters

avatar
Super Collaborator

Checkpointing process is defined in 2 configuration parameters:

dfs.namenode.checkpoint.period -> 10 hr which will set the time for every checkpoint.

Aslos dfs.namenode.checkpoint.txns is by default set to 1 million. What does this configuration do, in apache wiki website it says this configurations sets the number of uncheckpointed transactions.

In the first configuration we are setting it for every 10hrs, in such scenario what is the necessity for setting value for uncheckpointed transactions.

1 ACCEPTED SOLUTION

avatar

@Viswa

dfs.namenode.checkpoint.period – The number of seconds between two periodic checkpoints.

dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.

A real life analogy is how we take our car for regular maintenance service.

dfs.namenode.checkpoint.period >> number of months (say 6 months)

dfs.namenode.checkpoint.txns >> number of miles driven (say 5000 miles)

You will have to take your car for service if:

It has been 6 months since your car’s last service and you may have driven less than 5000 miles

OR

It has not been 6 month since your car’s most recent service but you have driven 5000 miles already.

Thus a new checkpoint will be created if either the checkpoint period is reached or number of unchecked transactions has maxed out, whichever happens first.

This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.

View solution in original post

4 REPLIES 4

avatar

@Viswa

dfs.namenode.checkpoint.period – The number of seconds between two periodic checkpoints.

dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.

A real life analogy is how we take our car for regular maintenance service.

dfs.namenode.checkpoint.period >> number of months (say 6 months)

dfs.namenode.checkpoint.txns >> number of miles driven (say 5000 miles)

You will have to take your car for service if:

It has been 6 months since your car’s last service and you may have driven less than 5000 miles

OR

It has not been 6 month since your car’s most recent service but you have driven 5000 miles already.

Thus a new checkpoint will be created if either the checkpoint period is reached or number of unchecked transactions has maxed out, whichever happens first.

This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.

avatar
Super Collaborator

let me put this scenario. If dfs.namenode.checkpoint.period ->1 hr and dfs.namenode.checkpoint.txns -> 1 million which creates checkpoint at 2PM, 3PM. After 10 mins past 3PM there were 1 million unchecked transactions so a new checkpoint would be created at 3 10PM. So the next checkpoint would be either after 1 hr ie at 4 10 PM or if it hits 1 million unchecked transactions before 4 10 PM.

Please let me know if I understood right way

avatar

@Viswa - you are absolutely correct with this.

avatar

@Viswa - Kindly accept the answer if my answer as helped you.