Support Questions

Find answers, ask questions, and share your expertise

Intermittently one of the journal nodes get out of Sync

avatar
Master Collaborator

Hi,

 

I have 3 JNs, 2 on physical servers and the 3rd on virtual server with 6 Vcores.

 

Recently from time to time the vm server get out of sync for few seconds, I checked the vm resources and parmeters and nothing looks out of the rodinary, what is see in Cloudera manager metrics that the journal write bytes sometime are higher than different times

 

here what i see:

 

The active NameNode was out of sync with this JournalNode.

 

===============

 

org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1659311573 expecting nextTxId=1659311555
	at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:485)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:371)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

 

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

When i checked the job/the query that occur prior to the alert on the JN, i found one hive query that runs on a data of 6 months and recreate the hive table from new, which resulted in a good percentage of edit logs, i contacted the query owner and he reduced the his running window from 6 months to 2 months which solve for us the issue.

View solution in original post

11 REPLIES 11

avatar
Champion
If I recall correctly, edits logs will just be filled up to a certain size and then move to the next.

In CM, the Namenode metrics for Transaction, Edit Log Syncs and Average Edit Log Sync Time would be better.

Not sure if these are exposed by default.

avatar
Master Collaborator

When i checked the job/the query that occur prior to the alert on the JN, i found one hive query that runs on a data of 6 months and recreate the hive table from new, which resulted in a good percentage of edit logs, i contacted the query owner and he reduced the his running window from 6 months to 2 months which solve for us the issue.