Can anyone tell me the actual difference between dfsadmin commands 'fetchImage' and 'saveNameSpace'.
I understood the difference from hadoop document. Here my requirement is to take namenode metadata backups regularly. So can one tell me what is the best method to use between savenamespace and fetchimage
Use fetchImage if you just want a backup.
Savenamespace will trigger a checkpoint (merge fsimage with edit logs and generate a new image) which shall take more time.
However, what do you want to take backups for? Checkpointing already takes care of that if you have a SecondaryNameNode or StandByNameNode.
If you don't have one of those in your cluster, then you can take backup by fetchImage
Please clear if my understanding is wrong. FetchImage will fetch the latest fsImage into local directory. Does the latest fsImage mean that it will also include checkpointing (latest fsImage can be created only after checkpointing right). ?
NameNode has fsimage and edit logs,
During checkpointing (which is automatically triggered either when number of entries in edit log exceeds a specified limit or a specified time has passed since the last checkpoint), edit logs are merged to fsimage to generate a new fsimage.
FetchImage will just give you the latest fsimage without merging the edits. So you have a saved state (which is not the latest), but you can still do partial recovery in case of failure. You'll lose as many edits as the number of entries in edit log (since last merge)