New Contributor
Posts: 4
Registered: ‎09-09-2017

Hdfs metadata backup strategy

We have active name node and standby name node are running, Could you please explain what is the best practice to maintain Hdfs meta data backup strategy by considering consistent and no service loss and data loss. can we take backup while running the services and if yes could you please summarize?
Posts: 776
Registered: ‎05-16-2016

Re: Hdfs metadata backup strategy

You could write a python or shell script to backup your fsimage on a regular basis put it in cron tab to automate . 

Yes you can take a copy of your fsimage without interrupting the service . 

New Contributor
Posts: 4
Registered: ‎09-09-2017

Re: Hdfs metadata backup strategy

Hi Csguna,

Thanks for your help, Please correcct me if my understanding is correct or not


When we say we can copy fsimage without interrupting the service via below command


hdfs dfsadmin -fetchImage backup_dir


 The backup produced with above command will be consistent as while starting up name node,NameNode process reads the fsimage file and loads it to memory and also it applies if any edits present in journal nodes newer than the fsimage.In other case if jornal nodes are not available the its possible to lose data/changes occured in the interim.


Is there any best practices like in what frequency we can take metadata backup for prod hadoop clusters.


Could you please help me what will happenn in below scenario,

"When Standby Namenode is down for longer duaration , who does checkpointing operations", Is there any cases of loosing data or inconsistency if Active name node also crashes during this time?


Our community is getting a little larger. And a lot better.

Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.