SYMPTOM: The Standby NameNode process running on our 2nd of four management node servers isn't running. Interrogating the log files, I've found an exception relating to an Oozie job
ERROR: Below was the error logs -
2016-12-20 09:20:17,286 INFO namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 'http://node1:8480/getJournal?jid=namenodeha&segmentTxId=16740759&storageInfo=-63%3A1400038789%3A0%3ACID-031f35b2-59c9-42f9-8942-550aee3d39e6, http://node1:8480/getJournal?jid=namenodeha&segmentTxId=16740759&storageInfo=-63%3A1400038789%3A0%3ACID-031f35b2-59c9-42f9-8942-550aee3d39e6' to transaction ID 16713078
2016-12-20 09:20:17,287 INFO namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 'http://node1:8480/getJournal?jid=namenodeha&segmentTxId=16740759&storageInfo=-63%3A1400038789%3A0%3ACID-031f35b2-59c9-42f9-8942-550aee3d39e6' to transaction ID 16713078
2016-12-20 09:20:18,287 INFO namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(266)) - replaying edit log: 48858/805951 transactions completed. (6%)
2016-12-20 09:20:18,485 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(242)) - Encountered exception on operation DeleteSnapshotOp [snapshotRoot=/apps/hive/warehouse, snapshotName=oozie-snapshot-2016_12_16-08_01, RpcClientId=1f566cee-d0eb-4a84-a615-40cdd31bc772, RpcCallId=1]
2016-12-20 09:20:18,599 ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start namenode.
2016-12-20 09:20:18,601 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-12-20 09:20:18,602 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
ROOT CAUSE: Suspected that the edits logs were corrupted and it was causing the issue for Standby namenode to startup. Replicating the metadata from primary namenode to standby didn't worked.