Created 05-19-2016 07:50 AM
I understand that namenode format will not delete the data but will not know about that data anymore.
but let's say I executed namenode format command. Then hadoop fs -ls will give me the same data that was in there before formatting or it will give empty dirs?
Created 05-19-2016 09:23 AM
Hi Sameer,
If you format a namenode, you will erase all metadata on that namenode, so the namenode will no longer be aware of the data stored under the datanodes. When you format the namenode will also get a new NamespaceId.
If you run hadoop fs -ls / you will get data related to the new NamespaceId, thus you will get an empty file structure.
Created 05-19-2016 09:23 AM
Hi Sameer,
If you format a namenode, you will erase all metadata on that namenode, so the namenode will no longer be aware of the data stored under the datanodes. When you format the namenode will also get a new NamespaceId.
If you run hadoop fs -ls / you will get data related to the new NamespaceId, thus you will get an empty file structure.
Created 05-19-2016 09:38 AM
And to be more detailed on one aspect of that point. If you stored sensitive information on HDFS it will still be stored physically just not easily accessible anymore. So if you format HDFS to destroy the information you will not succeed. Your best bet is to treat the disks that store HDFS with a common overwrite procedure as you would do when removing sensitive information from local file systems.
Created 01-24-2023 03:34 AM
Hey @joao_ascenso!
After formatting the namenode, I am not able to see my file structure as you mentioned in your answer. How can I recover the namenode metadata. The blocks are present on the datanodes, but can you help me recover them so that I can see them in hdfs using hdfs dfs -ls command like before?
Created 05-19-2016 10:01 AM
Adding to above , In this scenario datanode service may fail with "Incompatible ID" .
To resolve you need to re-register the datanode as datanode will have the old namespaceid.
Created 05-19-2016 10:11 AM
The documentation (https://wiki.apache.org/hadoop/GettingStartedWithHadoop) implies that the data is gone, which is what most humans would expect, comparing to file system such as ext4 or NTFS. However, this is not the case with HDFS.
In HDFS, data is stored by each datanode as blocks on the real (ext4) filesystem. The datanode only knows about blocks, and knows nothing about HDFS files or HDFS directories. This page https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html explains the architecture, especially the section on "Data Replication".
If you need to delete the entire filesystem, you should first delete all the directories using an HDFS command such as "hdfs dfs rm -rf -skipTrash" before doing the "hdfs -format". Or use Christian's suggestion above and repeatedly overwrite the files in each datanode's data directories - but that may be a lot of work if you have a large cluster.
Created 01-24-2023 05:34 AM
@bvishal, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
Regards,
Vidya Sargur,