Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎06-28-2015

Backup & restore strategy for HDFS metadata.

We have recently upgraded to CDH5.4 with the help of Cloudera engineer. We have build our cluster as a POC, it is running fine at the moment. We are looking at devising a Backup Strategy for HDFS Metadata, and would like to test them. Please can you help me understand the following

a. How much space would it need. Assuming HDFS Namenode has 100GB of namenode dir (fsimage & editlogs+)

b. What are the security privileges required for the user to take the backup

c. How can we test the backup strategy.

d. How can we ensure that the namenode dir is courrption free?

How frequent these backups should be taken? Looking forward to your advice.

Posts: 1,760
Kudos: 379
Solutions: 282
Registered: ‎07-31-2013

Re: Backup & restore strategy for HDFS metadata.

You only need the fsimage file periodically, and the VERSION file from the NN/current directory after every upgrade.

You can obtain the former via the 'hdfs dfsadmin -fetchImage' command.

> a. How much space would it need. Assuming HDFS Namenode has 100GB of namenode dir (fsimage & editlogs+)

You typically would need space only as much as the fsimage currently takes, or plan based on the rate it has been growing by.

> b. What are the security privileges required for the user to take the backup

The user invoking the command must be a HDFS Superuser ('hdfs' user or member of the group denoted by the value of 'dfs.permissions.supergroup' in NN configuration).

> c. How can we test the backup strategy.

You can start a NameNode instance on a directory containing the two files. If the NN Web UI comes up and shows the full file summary, then the backup is valid.

> d. How can we ensure that the namenode dir is courrption free?

The command used to obtain the fsimage will error out if it does not grab the file in a consistent manner.
New Contributor
Posts: 4
Registered: ‎09-09-2017

Re: Backup & restore strategy for HDFS metadata.

Hi Harish,
Could you please explain for normal scenario, what is the best practice to maintain backup strategy by considering consistent and no service loss. As you mentioned only FSimage is sufficient for taking backup what about edit logs if incase of service loss right after taking backup?
Are we loosing some data in this scenario?
Announcements