Created on 04-12-2016 09:06 PM - edited 08-17-2019 12:47 PM
I know many of us are aware that Role of Journal nodes is to keep both the Namenodes in sync and avoid hdfs split brain scenario by allowing only Active NN to write into journals. Have you ever wonder how does it works? Here you go!
Journal nodes are distributed system to store edits. Active Namenode as a client writes edits to journal nodes and commit only when its replicated to all the journal nodes in a distributed system. Standby NN need to read data from edits to be in sync with Active one. It can read from any of the replica stored on journal nodes.
ZKFC will make sure that only one Namenode should be active at a time. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, we should configure fencing methods even when using the Quorum Journal Manager.
To work with fencing journal manager uses epoc numbers. Epoc numbers are integer which always gets increased and have unique value once assigned. Namenode generate epoc number using simple algorithm and uses it while sending RPC requests to the QJM. When you configure Namenode HA, the first Active Namenode will get epoc value 1. In case of failover or restart, epoc number will get increased. The Namenode with higher epoc number is considered as newer than any Namenode with earlier epoc number.
Quorum journal manager stores epoc number locally which called as promised epoc. Whenever JournalNode receives RPC request along with epoc number from Namenode, it compares the epoch number with promised epoch. If request is coming from newer node which means epoc number is greater than promised epoc then itrecords new epoc number as promised epoc. If the request is coming from Namenode with older epoc number, then QJM simply rejects the request.
WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal <journal-node-hostname>:<port> failed to write txns 2397121201-2397121201. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 112 is less than the last promised epoch 113
Created on 04-13-2016 04:39 PM
Great Article Kuldeep.
Created on 04-13-2016 06:51 PM
@Kuldeep Kulkarni - Nice one!
Created on 04-13-2016 07:02 PM
Thanks @jramakrishnan
Created on 04-13-2016 07:02 PM
Thanks @Mayur Bhokase
Created on 03-18-2019 03:20 PM
How to resolve this, if we are getting in production environment?
Created on 08-21-2019 12:58 AM
Thanks a lot, i wondered how QJM works till i found this article
Created on 10-12-2020 07:15 AM
We are facing same issue in our production environment which takes one of the namenode down all the time with below errors.
IPC's epoch 24 is less than the last promised epoch 25
How can we resolve this in production without downtime?
Thanks
Mahesh
Created on 11-03-2020 06:06 AM
IPC's epoch 112 is less than the last promised epoch 113
How to resolve the above issue?
Created on 11-29-2020 12:09 AM - edited 11-29-2020 12:12 AM
When we restart the JournalNode Quorum the epoch number will change. We usually see that the errors when the JournalNodes are not in sync.
Check for the writer epoch on current dir for JournalNode process, which one of the JournalNodes is lacking we can manually copy the files from working JournalNode and it will pick up.
This should happen automatically when we restart the JournalNodes, if not then above is the procedure.