Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Checkpoint node configuration parameters

Solved Go to solution

Checkpoint node configuration parameters

Expert Contributor

Checkpointing process is defined in 2 configuration parameters:

dfs.namenode.checkpoint.period -> 10 hr which will set the time for every checkpoint.

Aslos dfs.namenode.checkpoint.txns is by default set to 1 million. What does this configuration do, in apache wiki website it says this configurations sets the number of uncheckpointed transactions.

In the first configuration we are setting it for every 10hrs, in such scenario what is the necessity for setting value for uncheckpointed transactions.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Checkpoint node configuration parameters

@Viswa

dfs.namenode.checkpoint.period – The number of seconds between two periodic checkpoints.

dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.

A real life analogy is how we take our car for regular maintenance service.

dfs.namenode.checkpoint.period >> number of months (say 6 months)

dfs.namenode.checkpoint.txns >> number of miles driven (say 5000 miles)

You will have to take your car for service if:

It has been 6 months since your car’s last service and you may have driven less than 5000 miles

OR

It has not been 6 month since your car’s most recent service but you have driven 5000 miles already.

Thus a new checkpoint will be created if either the checkpoint period is reached or number of unchecked transactions has maxed out, whichever happens first.

This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.

View solution in original post

4 REPLIES 4
Highlighted

Re: Checkpoint node configuration parameters

@Viswa

dfs.namenode.checkpoint.period – The number of seconds between two periodic checkpoints.

dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.

A real life analogy is how we take our car for regular maintenance service.

dfs.namenode.checkpoint.period >> number of months (say 6 months)

dfs.namenode.checkpoint.txns >> number of miles driven (say 5000 miles)

You will have to take your car for service if:

It has been 6 months since your car’s last service and you may have driven less than 5000 miles

OR

It has not been 6 month since your car’s most recent service but you have driven 5000 miles already.

Thus a new checkpoint will be created if either the checkpoint period is reached or number of unchecked transactions has maxed out, whichever happens first.

This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.

View solution in original post

Highlighted

Re: Checkpoint node configuration parameters

Expert Contributor

let me put this scenario. If dfs.namenode.checkpoint.period ->1 hr and dfs.namenode.checkpoint.txns -> 1 million which creates checkpoint at 2PM, 3PM. After 10 mins past 3PM there were 1 million unchecked transactions so a new checkpoint would be created at 3 10PM. So the next checkpoint would be either after 1 hr ie at 4 10 PM or if it hits 1 million unchecked transactions before 4 10 PM.

Please let me know if I understood right way

Highlighted

Re: Checkpoint node configuration parameters

@Viswa - you are absolutely correct with this.

Highlighted

Re: Checkpoint node configuration parameters

@Viswa - Kindly accept the answer if my answer as helped you.

Don't have an account?
Coming from Hortonworks? Activate your account here