Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hdfs metadata backup strategy


Hdfs metadata backup strategy

New Contributor
We have active name node and standby name node are running, Could you please explain what is the best practice to maintain Hdfs meta data backup strategy by considering consistent and no service loss and data loss. can we take backup while running the services and if yes could you please summarize?

Re: Hdfs metadata backup strategy


You could write a python or shell script to backup your fsimage on a regular basis put it in cron tab to automate . 

Yes you can take a copy of your fsimage without interrupting the service . 

Re: Hdfs metadata backup strategy

New Contributor

Hi Csguna,

Thanks for your help, Please correcct me if my understanding is correct or not


When we say we can copy fsimage without interrupting the service via below command


hdfs dfsadmin -fetchImage backup_dir


 The backup produced with above command will be consistent as while starting up name node,NameNode process reads the fsimage file and loads it to memory and also it applies if any edits present in journal nodes newer than the fsimage.In other case if jornal nodes are not available the its possible to lose data/changes occured in the interim.


Is there any best practices like in what frequency we can take metadata backup for prod hadoop clusters.


Could you please help me what will happenn in below scenario,

"When Standby Namenode is down for longer duaration , who does checkpointing operations", Is there any cases of loosing data or inconsistency if Active name node also crashes during this time?