is there a differente between Secondary NameNode and Standby node, because I get confused when I read this note from hortonwork website documentation :
Secondary NameNode is not required in HA configuration because the Standby node also performs the tasks of the Secondary NameNode.
@yassine sihi, there are two different concepts. HDFS can be deployed in two modes. 1) Without HA 2) With HA.
In without HA mode, HDFS will have Namenode and Secondary Namenode. Here, secondary namenode periodically take snapshot of namenode and keep the metadata and audit logs up to date. So in case of Namenode failure, Secondary Namenode will have copy of latest namenode activity and prevent data loss.
In HA mode, HDFS have two set of Namenodes. One acts as active namenode and another acts as Standby Namenode. The duties of standby namenode is similar to Secondary namenode where it keeps the track of active namenode activity and take snapshot periodically. Here, in case of active namenode failure, standby namenode automatically takes the control and becomes active. This way user will not notice the failure in namenode. This way High availability is guaranteed.
Here's quite a simple solution for these issue cases.
issue 1. In case of NameNode/Secondary NameNode, if NameNode Server is fully damaged to disk which is installed NameNode metadata, Secondary NameNode is not convert to NameNode, and you can't fully recover NameNode metadata.
issue 2. In case of NameNode/Secondary NameNode, if NameNode service is down, then you'll be unable to execute hadoop MR job or Yarn application or access HDFS Filesystem.
Issue 3. In case of NameNode HA(Active/StandBy NameNode), if Active NameNode Server is fully damaged to disk which is installed NameNode metadata, the StandBy NameNode is switched Active NameNode within 30 to 40 sec. And you'll have some time for recovering damaged former NameNode Server.
Issue4. In case of NameNode HA(Active/StandBy NameNode), if NameNode service is down, then StandBy NameNode is switched Active NameNode within 30 to 40 sec, and you can use and MR job or YARN application or access HDFS filesystem.
Secondary Namenode is one of the poorly named component in Hadoop. By its name, it gives a sense that its a backup for the Namenode.But in reality its not. Lot of beginners in Hadoop get confused about what exactly SecondaryNamenode does and why its present in HDFS.
This article explains exactly how SecondaryNamenode works http://blog.madhukaraphatak.com/secondary-namenode---what-it-really-do/
In a very simple way as if Standby were Streaming Replication with integrated failover; While SecondaryNamenode was Timed LogShipping