Created on 07-25-2017 07:14 PM - last edited on 08-25-2019 06:29 PM by ask_bill_brooks
Hi,
I am trying to understand how Namenode metadata is written in the directories. So if we have 2 directories in the Namenode metadata directory list, that means meta data is written parallely in both directories for redundancy? which means 2 copies of metadata is present ?
Thanks.
Created 07-25-2017 08:18 PM
If you are just looking for redundancy then it is achieved by writing namenode metadata on journal nodes (typically three), and both standby and active name node point to same journal nodes. When active namenode goes down, Zookeeper, simply needs to make standby node active and it already pointing to same data which is replicated on three journal nodes.
If you don't have journal nodes and you have only one namenode, then your namenode metadata is written only once but here it is recommended that you use RAID 10 array so one disk failure is not going to result in data loss.
To answer your question whether two copies of metadata are present, the answer is it depends. If you are using RAID 10 then your disk array is making a copy of blocks but that's not really a copy in the sense you are asking.
If High Availability is enabled and you are using journal nodes, then you do have three copies of metadata available on three different nodes.
Created 07-25-2017 08:18 PM
If you are just looking for redundancy then it is achieved by writing namenode metadata on journal nodes (typically three), and both standby and active name node point to same journal nodes. When active namenode goes down, Zookeeper, simply needs to make standby node active and it already pointing to same data which is replicated on three journal nodes.
If you don't have journal nodes and you have only one namenode, then your namenode metadata is written only once but here it is recommended that you use RAID 10 array so one disk failure is not going to result in data loss.
To answer your question whether two copies of metadata are present, the answer is it depends. If you are using RAID 10 then your disk array is making a copy of blocks but that's not really a copy in the sense you are asking.
If High Availability is enabled and you are using journal nodes, then you do have three copies of metadata available on three different nodes.
Created 07-25-2017 09:17 PM
Thanks so much for the information. But I am still confused with the directories for
NameNode directories and dfs.journalnode.edits.dir
Where are the actual edits and fsimage exists ?
Created 07-25-2017 10:12 PM
These directories exists on journal nodes if that's what you are using or whatever disk you will specify in ambari for namenode when you do your install. I think you will find the following link helpful.
https://hortonworks.com/blog/hdfs-metadata-directories-explained/