Support Questions

Find answers, ask questions, and share your expertise

How to debug the issue "IPC's epoch X is less than the last promised epoch Y"?

avatar
Explorer

There's a classic issue/exception in HDFS HA with ZKFC, that is, "IPC's epoch X is less than the last promised epoch Y". What are the best suggested steps to troubleshoot the problem? How to find the root cause? What are the possible reasons? Thanks.

1 ACCEPTED SOLUTION

avatar

Hello @Xiaobing Zhou,

This may indicate that either a NameNode or JournalNodes were unresponsive for a period of time. This can lead to a cascading failure, whereby a NameNode HA failover occurs, the other NameNode becomes active, the previous NameNode thinks it is still active, and then QJM rejects that NameNode for not operating within the same "epoch" (logical period of time). This is by design, as QJM is intended to prevent 2 NameNodes from mistakenly acting as active in a split-brain scenario.

There are multiple potential reasons for unresponsiveness in the NameNode/JournalNode interaction. Reviewing logs from the NameNodes and JournalNodes would likely reveal more details. There are several common causes to watch for:

  1. A long stop-the-world garbage collection pause may surpass the timeout threshold for the call. Garbage collection logging would show what kind of garbage collection activity the process is doing. You might also see log messages about the "JvmPauseMonitor". Consider reviewing the article NameNode Garbage Collection Configuration: Best Practices and Rationale to make sure your cluster's heap and garbage collection settings match best practices.
  2. In environments that integrate with LDAP for resolution of users' group memberships, load problems on the LDAP infrastructure can cause delays. In extreme cases, we have seen such timeouts at the JournalNodes cause edit logging calls to fail, which causes a NameNode abort and an HA failover. See Hadoop and LDAP: Usage, Load Patterns and Tuning for a more detailed description and potential mitigation steps.
  3. It is possible that there is a failure in network connectivity between the NameNode and the JournalNodes. This tends to be rare, because NameNodes and JournalNodes tend to be colocated on the same host or placed relatively close to one another in the network topology. Still, it is worth investigating that basic network connectivity between all NameNode hosts and all JournalNode hosts is working fine.

View solution in original post

3 REPLIES 3

avatar
Master Guru

@Xiaobing Zhou

One major reason could be - Suppose you are getting these errors in X namenode which was active. it was unresponsive for some reason ( may be network connectivity or it was busy in processing datanode's reports or something else and could not able to communicate with zkfc ) and fencing has happened, now Y is your active NN and when X becomes responsive, it assumes that I'm the active NN and tries to send write request to the journal node. As Y is already active, last promised epoc value was increased and journal node will simply reject the write request from X.

Please read detailed information about this at below link.

https://community.hortonworks.com/articles/27225/how-qjm-works-in-namenode-ha.html

Hope this information helps.

Happy Hadooping!! 🙂

avatar

Hello @Xiaobing Zhou,

This may indicate that either a NameNode or JournalNodes were unresponsive for a period of time. This can lead to a cascading failure, whereby a NameNode HA failover occurs, the other NameNode becomes active, the previous NameNode thinks it is still active, and then QJM rejects that NameNode for not operating within the same "epoch" (logical period of time). This is by design, as QJM is intended to prevent 2 NameNodes from mistakenly acting as active in a split-brain scenario.

There are multiple potential reasons for unresponsiveness in the NameNode/JournalNode interaction. Reviewing logs from the NameNodes and JournalNodes would likely reveal more details. There are several common causes to watch for:

  1. A long stop-the-world garbage collection pause may surpass the timeout threshold for the call. Garbage collection logging would show what kind of garbage collection activity the process is doing. You might also see log messages about the "JvmPauseMonitor". Consider reviewing the article NameNode Garbage Collection Configuration: Best Practices and Rationale to make sure your cluster's heap and garbage collection settings match best practices.
  2. In environments that integrate with LDAP for resolution of users' group memberships, load problems on the LDAP infrastructure can cause delays. In extreme cases, we have seen such timeouts at the JournalNodes cause edit logging calls to fail, which causes a NameNode abort and an HA failover. See Hadoop and LDAP: Usage, Load Patterns and Tuning for a more detailed description and potential mitigation steps.
  3. It is possible that there is a failure in network connectivity between the NameNode and the JournalNodes. This tends to be rare, because NameNodes and JournalNodes tend to be colocated on the same host or placed relatively close to one another in the network topology. Still, it is worth investigating that basic network connectivity between all NameNode hosts and all JournalNode hosts is working fine.

avatar
Explorer

Thank you @Chris Nauroth and @Kuldeep Kulkarni for the answer. It's quite clear.