Support Questions

Fawze · ‎02-18-2017

Hi,

I have 3 JNs, 2 on physical servers and the 3rd on virtual server with 6 Vcores.

Recently from time to time the vm server get out of sync for few seconds, I checked the vm resources and parmeters and nothing looks out of the rodinary, what is see in Cloudera manager metrics that the journal write bytes sometime are higher than different times

here what i see:

The active NameNode was out of sync with this JournalNode.

===============

org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1659311573 expecting nextTxId=1659311555
	at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:485)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:371)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

Fawze · ‎03-05-2017

When i checked the job/the query that occur prior to the alert on the JN, i found one hive query that runs on a data of 6 months and recreate the hive table from new, which resulted in a good percentage of edit logs, i contacted the query owner and he reduced the his running window from 6 months to 2 months which solve for us the issue.

View solution in original post

mbigelow · ‎02-20-2017

If I recall correctly, edits logs will just be filled up to a certain size and then move to the next.

In CM, the Namenode metrics for Transaction, Edit Log Syncs and Average Edit Log Sync Time would be better.

Not sure if these are exposed by default.

Fawze · ‎03-05-2017

When i checked the job/the query that occur prior to the alert on the JN, i found one hive query that runs on a data of 6 months and recreate the hive table from new, which resulted in a good percentage of edit logs, i contacted the query owner and he reduced the his running window from 6 months to 2 months which solve for us the issue.

Cloudera Community

Support Questions

Intermittently one of the journal nodes get out of Sync