Support Questions
Find answers, ask questions, and share your expertise

How to reconstruct HDFS from snapshot files which are taken by Cloudera Manager

Explorer

Hi all,

 
Cloudera Manager 5 offers a way to take then restore snapshots for the whole HDFS through web page. Here is the link about it. I would like to save the snapshots files in my storage devices, then restore the whole HDFS from these snapshots files by myself. Here are some action items and questions to achieve this goal.
 
1: Take snapshot.
    Done. I defined the snapshot policy and took the HDFS snapshot through Cloudera Manager web page.
 
2: Copy snapshot files to my storage device
    Where are the snapshot files in HDFS or Cloudera Manager? Can I copy snapshot files to my storage device?
 
3: Construct HDFS from these snapshot files
    I had better construct all the files in HDFS from snapshot files by myself instead of calling commands from Cloudera Manager. So what is the format in snapshot files? How to construct HDFS files from these snapshot files? If it is not feasible, how to restore it with the help of Cloudera Manager?
 
Thanks,
Jack Chen
3 REPLIES 3

Re: How to reconstruct HDFS from snapshot files which are taken by Cloudera Manager

Master Collaborator

@JackChen I apologize for the delay on this one, but I'll attempt to answer your question.  HDFS snapshots, I believe, are very similar to HBase Snapshots, so I think some of the same principals apply.  In essence, when you make a snapshot of a specific directory in HDFS, it just creates a .snapshot directory underneath the directory that you snapshotted.  Inside that .snapshot directory will be some small metadata files that reference the contents of the directory that you snapshotted.  The HDFS Snapshots wiki I linked above explains it in detail.

 

So, for your #2 question, yes, you can use just a regular "hadoop fs -get" on those snapshotted files to copy them out of HDFS and onto local storage.

 

Once you have copied the data out of HDFS, you can always restore it manually by just putting whatever file you want back in HDFS.  This adds a lot of manual work to the picture, but should accomplish your goal.

Re: How to reconstruct HDFS from snapshot files which are taken by Cloudera Manager

Explorer

@Clint

Thanks for your response. How to manually restore HDFS with snapshot files? Is there a command or API to restore it? I know there is API and command to create snapshot. In Cloudera Manager web console, it seems it deleted all the current file in HDFS snapshottable directory then copy these files from snapshot to snapshottable HDFS directory. But if my snapshottable directory is the HDFS root directory, it will cost huge time to finish the manual copy. 

 

Thanks,

Jack Chen

Re: How to reconstruct HDFS from snapshot files which are taken by Cloudera Manager

Explorer

@clint