Created on 11-15-2019 03:35 PM - last edited on 11-15-2019 03:58 PM by ask_bill_brooks
Hello,
I have a basic question about about namenode format. If we perform namenode format, will clean up all the metadata info in namenode like directory structure, etc. After it completed, if we restart namenode, would it rebuild the directory structure from datanode information? Or we are lost.
Created 11-15-2019 10:50 PM
You are not far from the truth !!! The name NameNode contains the metadata of the HDFS files ie permissions and location etc . This metadata is present in a serialized form inside a single file(fsimage) and edits file that has a log of all the changes made to the file system. The fsimage file is kept both on-disk and in-memory. All changes to file system is reflected in-memory and periodically transferred to disk. Details on how to fetch fsimage and edits file is given here HDFS File System Metadata Backup.
If you format the namenode then the basic information about ownership, permissions and location are deleted from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir the namenode metadata will be gone but your data in the data nodes intact actually formatting a Namenode will not format the Datanode.
In the other hand namenode will no longer receieve heartbeats from the datanode nor where your data is as -format assign a new namespace ID to the namenode a
***
You will need to change your namespaceID in your datanode to make your datanode work. This will be at /hadoop/hdfs/namenode/current
[root@nanyuki current]# cat VERSION
#Fri Nov 15 21:29:31 CET 2019
namespaceID=107632589
clusterID=CID-72e79d8b-ea16-4d5c-9920-6b579e5c26b0
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2067995211-192.168.0.101-1537740712051
layoutVersion=-63
Once the new namespaceID has been updated on all the datanodes then the namenode will start receiving heartbeats from the datanodes as each datanodes will report during the heartbeat the files it has and eventually that is the information the namenode will use to rebuild its metadata
HTH
Created 11-16-2019 10:29 AM
Thank you for the information. If I have multiple directories configured for dfs.namenode.name.dir.
Defualt: /hadoop/hdfs/namenode
I can configure
/backup/namenode
/goldbackup/namenode
In this case I will have multiple copy of FSimage and Editlog and in case one the directory is corrupted, I can use another one. Does it make sense?
Created 11-16-2019 01:51 PM
Yes having multiple copies is good backup strategy so long as the mount points are physically different disks that don't share disk controllers
Please, can you share feedback on the outcome of the earlier procedure? I have not tried adding addition FSIMAGE and Edits location after the creation of the cluster I am wondering whether you could startup the name node unless you formated it which is now a different story altogether
Happy hadooping
Created 11-15-2019 10:50 PM
You are not far from the truth !!! The name NameNode contains the metadata of the HDFS files ie permissions and location etc . This metadata is present in a serialized form inside a single file(fsimage) and edits file that has a log of all the changes made to the file system. The fsimage file is kept both on-disk and in-memory. All changes to file system is reflected in-memory and periodically transferred to disk. Details on how to fetch fsimage and edits file is given here HDFS File System Metadata Backup.
If you format the namenode then the basic information about ownership, permissions and location are deleted from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir the namenode metadata will be gone but your data in the data nodes intact actually formatting a Namenode will not format the Datanode.
In the other hand namenode will no longer receieve heartbeats from the datanode nor where your data is as -format assign a new namespace ID to the namenode a
***
You will need to change your namespaceID in your datanode to make your datanode work. This will be at /hadoop/hdfs/namenode/current
[root@nanyuki current]# cat VERSION
#Fri Nov 15 21:29:31 CET 2019
namespaceID=107632589
clusterID=CID-72e79d8b-ea16-4d5c-9920-6b579e5c26b0
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2067995211-192.168.0.101-1537740712051
layoutVersion=-63
Once the new namespaceID has been updated on all the datanodes then the namenode will start receiving heartbeats from the datanodes as each datanodes will report during the heartbeat the files it has and eventually that is the information the namenode will use to rebuild its metadata
HTH
Created 11-16-2019 10:29 AM
Thank you for the information. If I have multiple directories configured for dfs.namenode.name.dir.
Defualt: /hadoop/hdfs/namenode
I can configure
/backup/namenode
/goldbackup/namenode
In this case I will have multiple copy of FSimage and Editlog and in case one the directory is corrupted, I can use another one. Does it make sense?
Created 11-16-2019 01:51 PM
Yes having multiple copies is good backup strategy so long as the mount points are physically different disks that don't share disk controllers
Please, can you share feedback on the outcome of the earlier procedure? I have not tried adding addition FSIMAGE and Edits location after the creation of the cluster I am wondering whether you could startup the name node unless you formated it which is now a different story altogether
Happy hadooping
Created 11-16-2019 05:02 PM