Created 03-09-2017 04:19 PM
Checkpointing process is defined in 2 configuration parameters:
dfs.namenode.checkpoint.period -> 10 hr which will set the time for every checkpoint.
Aslos dfs.namenode.checkpoint.txns is by default set to 1 million. What does this configuration do, in apache wiki website it says this configurations sets the number of uncheckpointed transactions.
In the first configuration we are setting it for every 10hrs, in such scenario what is the necessity for setting value for uncheckpointed transactions.
Created 03-09-2017 09:04 PM
dfs.namenode.checkpoint.period – The number of seconds between two periodic checkpoints.
dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.
A real life analogy is how we take our car for regular maintenance service.
dfs.namenode.checkpoint.period >> number of months (say 6 months)
dfs.namenode.checkpoint.txns >> number of miles driven (say 5000 miles)
You will have to take your car for service if:
It has been 6 months since your car’s last service and you may have driven less than 5000 miles
OR
It has not been 6 month since your car’s most recent service but you have driven 5000 miles already.
Thus a new checkpoint will be created if either the checkpoint period is reached or number of unchecked transactions has maxed out, whichever happens first.
This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.
Created 03-09-2017 09:04 PM
dfs.namenode.checkpoint.period – The number of seconds between two periodic checkpoints.
dfs.namenode.checkpoint.txns – The standby will create a checkpoint of the namespace every ‘dfs.namenode.checkpoint.txns’ transactions, regardless of whether ‘dfs.namenode.checkpoint.period’ has expired.
A real life analogy is how we take our car for regular maintenance service.
dfs.namenode.checkpoint.period >> number of months (say 6 months)
dfs.namenode.checkpoint.txns >> number of miles driven (say 5000 miles)
You will have to take your car for service if:
It has been 6 months since your car’s last service and you may have driven less than 5000 miles
OR
It has not been 6 month since your car’s most recent service but you have driven 5000 miles already.
Thus a new checkpoint will be created if either the checkpoint period is reached or number of unchecked transactions has maxed out, whichever happens first.
This article gives a nice understanding of HDFS metadata directories and how the above to properties fit into the ecosystem.
Created 03-09-2017 09:17 PM
let me put this scenario. If dfs.namenode.checkpoint.period ->1 hr and dfs.namenode.checkpoint.txns -> 1 million which creates checkpoint at 2PM, 3PM. After 10 mins past 3PM there were 1 million unchecked transactions so a new checkpoint would be created at 3 10PM. So the next checkpoint would be either after 1 hr ie at 4 10 PM or if it hits 1 million unchecked transactions before 4 10 PM.
Please let me know if I understood right way
Created 03-09-2017 09:28 PM
@Viswa - you are absolutely correct with this.
Created 03-14-2017 12:05 AM
@Viswa - Kindly accept the answer if my answer as helped you.