Support Questions
Find answers, ask questions, and share your expertise

How actually namenode HA QJM works?

I read the QJM in hadoop official documentation but there is no clear control flow explanation

So far I understand was

  • When the namenode fails zookeeper failure controller will detect and change the standby namenode as active namenode.

  • Only active namenode will write the edit log in Journal node

  • Journal will sync the edit log with standby namenode

Can any one please explain how the real flow works?

For example

  • step 1: when client connects request goes here ....
  • step 2: it will take care of these request ....
  • step 3: if it fails it will happen the request will go there....

Like that can anyone please explain the complete flow of QJM

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Contributor

In HA there will be two Namenodes, one is Active and other one is in the Standby state.The Active Namenode is primarily responsible for all running, upcoming operations and client requests in the cluster. At same time Standby acts as a slave.To maintain synchronization between Active and Standby nodes they both communicate with a group of demons called JournalNodes(JNs).

If there any Namespace modifications done by Active name node logs a modification record to a majority of these present JNs.The standby node reads the edits from the JNs and continuously watches the changes in edit log from JNs. Standby node updates its own namespace of observes in edits every time.JNs are shared edits in presence of QJM.when failover happens, the Standby ensures that it has read all edits from JNs before it takes over the Active position.

Standby acts as secondary name node because it performs all tasks done by secondary name node. So in HA configuration cluster no need of Secondary Namenode.

Hope this helps you. @karthik nedunchezhiyan

View solution in original post

6 REPLIES 6

Yes i saw that but he didn't explain the actual work flow

What will happen is edit log on Journal node becomes large?

Will standby namenode send new FSimage to active namenode?

How client finds the active namenode?

Contributor

In HA there will be two Namenodes, one is Active and other one is in the Standby state.The Active Namenode is primarily responsible for all running, upcoming operations and client requests in the cluster. At same time Standby acts as a slave.To maintain synchronization between Active and Standby nodes they both communicate with a group of demons called JournalNodes(JNs).

If there any Namespace modifications done by Active name node logs a modification record to a majority of these present JNs.The standby node reads the edits from the JNs and continuously watches the changes in edit log from JNs. Standby node updates its own namespace of observes in edits every time.JNs are shared edits in presence of QJM.when failover happens, the Standby ensures that it has read all edits from JNs before it takes over the Active position.

Standby acts as secondary name node because it performs all tasks done by secondary name node. So in HA configuration cluster no need of Secondary Namenode.

Hope this helps you. @karthik nedunchezhiyan

View solution in original post

What will happen is edit log on Journal node becomes large?

Will standby namenode send new FSimage to active namenode?

How client finds the active namenode?

New Contributor

@karthik nedunchezhiyan A simplified explanation of the process :

Whenever a NN HA is achieved, there will be two NNs , One Active NN and other Standby NN,

1) DataNodes will send heartbeats to both NNs , so both Active and Standby will know where the blocks are placed.

2) Journal Nodes maintain the Shared edits , Whenever there is a write operation the JNs will update the edits, not the Active or Standby NN. Once the edits are updated by JN, the Standby will update its FS Image.

3)So this way at any point in time both the Active and the Standby will have the same updated FS Image.

4)Zookeeper will be responsible for holding the lock for the Active NN.

5) There will be two Zookeeper Failover Controllers, which will be responsible for monitoring the health of the NNs.

6) Whenever the Zookeeper does not receive a communication from the Zookeeper FC, it will release the lock and this will be acquired by the other Zookeeper FC and the Standby NN will become the Active NN.

58436-nnha.png