Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

namenode format effect

avatar
Expert Contributor

I understand that namenode format will not delete the data but will not know about that data anymore.

but let's say I executed namenode format command. Then hadoop fs -ls will give me the same data that was in there before formatting or it will give empty dirs?

1 ACCEPTED SOLUTION

avatar
Contributor

Hi Sameer,

If you format a namenode, you will erase all metadata on that namenode, so the namenode will no longer be aware of the data stored under the datanodes. When you format the namenode will also get a new NamespaceId.

If you run hadoop fs -ls / you will get data related to the new NamespaceId, thus you will get an empty file structure.

View solution in original post

6 REPLIES 6

avatar
Contributor

Hi Sameer,

If you format a namenode, you will erase all metadata on that namenode, so the namenode will no longer be aware of the data stored under the datanodes. When you format the namenode will also get a new NamespaceId.

If you run hadoop fs -ls / you will get data related to the new NamespaceId, thus you will get an empty file structure.

avatar
Contributor

And to be more detailed on one aspect of that point. If you stored sensitive information on HDFS it will still be stored physically just not easily accessible anymore. So if you format HDFS to destroy the information you will not succeed. Your best bet is to treat the disks that store HDFS with a common overwrite procedure as you would do when removing sensitive information from local file systems.

avatar
Explorer

Hey @joao_ascenso!

After formatting the namenode, I am not able to see my file structure as you mentioned in your answer. How can I recover the namenode metadata. The blocks are present on the datanodes, but can you help me recover them so that I can see them in hdfs using hdfs dfs -ls command like before?

 

avatar
Super Collaborator

Adding to above , In this scenario datanode service may fail with "Incompatible ID" .

To resolve you need to re-register the datanode as datanode will have the old namespaceid.

avatar
Contributor

The documentation (https://wiki.apache.org/hadoop/GettingStartedWithHadoop) implies that the data is gone, which is what most humans would expect, comparing to file system such as ext4 or NTFS. However, this is not the case with HDFS.

In HDFS, data is stored by each datanode as blocks on the real (ext4) filesystem. The datanode only knows about blocks, and knows nothing about HDFS files or HDFS directories. This page https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html explains the architecture, especially the section on "Data Replication".

If you need to delete the entire filesystem, you should first delete all the directories using an HDFS command such as "hdfs dfs rm -rf -skipTrash" before doing the "hdfs -format". Or use Christian's suggestion above and repeatedly overwrite the files in each datanode's data directories - but that may be a lot of work if you have a large cluster.

avatar
Community Manager

@bvishal, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: