What is a Backup Node and how it works in Hadoop and roles and responsibilities of Backup Node in Apache Hadoop?
A Backup Node acts as a checkpoint node.
It saves the up to date copy of the Namenode metadata files in memory (FsImage and EditLogs) and saves it into the local filesystem FsImage file and resets edits synchronizing it with the active Name Node. Whenever the Name Node is started up, it uses the (files that are backed up in the local file system) FsImage file to know the latest state and uses edits to make the changes and comes back to the latest track.
One Backup Node is managed by one Namenode, if the Backup node is present then there is no need of a Checkpoint node.
NameNode in Hadoop stores Metadata. Two files associated with metadata are FsImage and EditLogs.
FsImage stores inode details like modification time, access time etc.
EditLogs contains all the recent modifications made to the file system about the most recent.
Backup node provides the same checkpointing functionality as the Checkpoint node (Checkpoint node is a node which periodically creates checkpoints of the namespace. Checkpoint Node downloads fsimage and edits from the active NameNode merges them locally, and uploads the new image back to the active NameNode).
In Hadoop, Backup node keeps an in-memory, up-to-date copy of the file system namespace, which is always synchronized with the active NameNode state. There is no need for this node to download fsimage and edits files from the active NameNode in order to create a checkpoint, as would be required with a Checkpoint node or Secondary Namenode, because it already has an up-to-data state of the namespace state in memory. The Backup node checkpoint process is more efficient as it only needs to save the namespace into the local fsimage file and reset edits. One Backup node is supported by the NameNode at a time. No checkpoint nodes may be registered if a Backup node is in use.
Prior to Hadoop 2.0, the Checkpoint Node functionality comprises the task of creating the periodic checkpoints of the filesystem metadata in the Primary node by merging the edits file with the fsimage file in the local memory. Each fsimage file merging transaction is termed as the Checkpoint.Then this fsimage file will be uploaded in the Secondary node Memory.
I don't see any Backup node being used in practice and these should be considered deprecated. In HA, at any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standbystate. The Active NameNode is responsible for all client operations in the cluster, while the Standby maintains enough state to provide a fast failover if necessary.
Backup Node in hadoop can be started with below command on the dedicated node configured in the cluster.
$ hdfs namenode -backup
Below two configuration variables are used for specifying the addresses of the Backup node and its web interface.
1. <dfs.namenode.backup.address 0.0.0.0:50100> The backup node server address and port. If the port is 0 then the server will start on a free port. 2. <dfs.namenode.backup.http-address 0.0.0.0:50105> The backup node http server address and port. If the port is 0 then the server will start on a free port.
Hope it helps. Thanks.