Support Questions

ask_bill_brooks · ‎11-15-2019

Hello,

I have a basic question about about namenode format. If we perform namenode format, will clean up all the metadata info in namenode like directory structure, etc. After it completed, if we restart namenode, would it rebuild the directory structure from datanode information? Or we are lost.

Shelton · ‎11-15-2019

@mokkan

You are not far from the truth !!! The name NameNode contains the metadata of the HDFS files ie permissions and location etc . This metadata is present in a serialized form inside a single file(fsimage) and edits file that has a log of all the changes made to the file system. The fsimage file is kept both on-disk and in-memory. All changes to file system is reflected in-memory and periodically transferred to disk. Details on how to fetch fsimage and edits file is given here HDFS File System Metadata Backup.

If you format the namenode then the basic information about ownership, permissions and location are deleted from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir the namenode metadata will be gone but your data in the data nodes intact actually formatting a Namenode will not format the Datanode.

In the other hand namenode will no longer receieve heartbeats from the datanode nor where your data is as -format assign a new namespace ID to the namenode a

***
You will need to change your namespaceID in your datanode to make your datanode work. This will be at /hadoop/hdfs/namenode/current

[root@nanyuki current]# cat VERSION
#Fri Nov 15 21:29:31 CET 2019
namespaceID=107632589
clusterID=CID-72e79d8b-ea16-4d5c-9920-6b579e5c26b0
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2067995211-192.168.0.101-1537740712051
layoutVersion=-63

Once the new namespaceID has been updated on all the datanodes then the namenode will start receiving heartbeats from the datanodes as each datanodes will report during the heartbeat the files it has and eventually that is the information the namenode will use to rebuild its metadata

HTH

View solution in original post

mokkan · ‎11-16-2019

Thank you for the information. If I have multiple directories configured for dfs.namenode.name.dir.

Defualt: /hadoop/hdfs/namenode

I can configure

/backup/namenode

/goldbackup/namenode

In this case I will have multiple copy of FSimage and Editlog and in case one the directory is corrupted, I can use another one. Does it make sense?

View solution in original post

Shelton · ‎11-16-2019

@mokkan

Yes having multiple copies is good backup strategy so long as the mount points are physically different disks that don't share disk controllers

Please, can you share feedback on the outcome of the earlier procedure? I have not tried adding addition FSIMAGE and Edits location after the creation of the cluster I am wondering whether you could startup the name node unless you formated it which is now a different story altogether

Happy hadooping

View solution in original post

Shelton · ‎11-15-2019

@mokkan

You are not far from the truth !!! The name NameNode contains the metadata of the HDFS files ie permissions and location etc . This metadata is present in a serialized form inside a single file(fsimage) and edits file that has a log of all the changes made to the file system. The fsimage file is kept both on-disk and in-memory. All changes to file system is reflected in-memory and periodically transferred to disk. Details on how to fetch fsimage and edits file is given here HDFS File System Metadata Backup.

If you format the namenode then the basic information about ownership, permissions and location are deleted from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir the namenode metadata will be gone but your data in the data nodes intact actually formatting a Namenode will not format the Datanode.

In the other hand namenode will no longer receieve heartbeats from the datanode nor where your data is as -format assign a new namespace ID to the namenode a

***
You will need to change your namespaceID in your datanode to make your datanode work. This will be at /hadoop/hdfs/namenode/current

[root@nanyuki current]# cat VERSION
#Fri Nov 15 21:29:31 CET 2019
namespaceID=107632589
clusterID=CID-72e79d8b-ea16-4d5c-9920-6b579e5c26b0
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2067995211-192.168.0.101-1537740712051
layoutVersion=-63

Once the new namespaceID has been updated on all the datanodes then the namenode will start receiving heartbeats from the datanodes as each datanodes will report during the heartbeat the files it has and eventually that is the information the namenode will use to rebuild its metadata

HTH

mokkan · ‎11-16-2019

Thank you for the information. If I have multiple directories configured for dfs.namenode.name.dir.

Defualt: /hadoop/hdfs/namenode

I can configure

/backup/namenode

/goldbackup/namenode

In this case I will have multiple copy of FSimage and Editlog and in case one the directory is corrupted, I can use another one. Does it make sense?

Shelton · ‎11-16-2019

@mokkan

Yes having multiple copies is good backup strategy so long as the mount points are physically different disks that don't share disk controllers

Please, can you share feedback on the outcome of the earlier procedure? I have not tried adding addition FSIMAGE and Edits location after the creation of the cluster I am wondering whether you could startup the name node unless you formated it which is now a different story altogether

Happy hadooping

mokkan · ‎11-16-2019

Thank you for the info. Yes, I have created with backup with another
directory and I was about to boot restart the namenode from that image.

Cloudera Community

Support Questions

namenode format question