Support Questions

Find answers, ask questions, and share your expertise

Intermittently one of the journal nodes get out of Sync

avatar
Master Collaborator

Hi,

 

I have 3 JNs, 2 on physical servers and the 3rd on virtual server with 6 Vcores.

 

Recently from time to time the vm server get out of sync for few seconds, I checked the vm resources and parmeters and nothing looks out of the rodinary, what is see in Cloudera manager metrics that the journal write bytes sometime are higher than different times

 

here what i see:

 

The active NameNode was out of sync with this JournalNode.

 

===============

 

org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1659311573 expecting nextTxId=1659311555
	at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:485)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:371)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

 

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

When i checked the job/the query that occur prior to the alert on the JN, i found one hive query that runs on a data of 6 months and recreate the hive table from new, which resulted in a good percentage of edit logs, i contacted the query owner and he reduced the his running window from 6 months to 2 months which solve for us the issue.

View solution in original post

11 REPLIES 11

avatar
Champion

Even though the VM looks fine it is probably a resource contraint on the VM that is causing this issue.  

 

The Namenode writes each edit to its own local directory and all of the JN edits directories.  It simply sounds like the VM isn't keeping up or getting the job done in time.

 

Examine the contents of the JN edits directory on each and you will find that the VM does on contain all of the necessary edits.  You can manually copy the edits_* files to the VM nodes to get it back in sync and see if it happens again.  I do recommend using the same hardware for all three Master nodes that would run each JN and ZK instance.  Otherwise, you will often be found on just barely maintaining the quorum to stay running.

 

dfs.namenode.shared.edits.dir

dfs.journalnode.edits.dir

avatar
Master Collaborator
Indeed it's happening for few seconds and then the Vm get Sync, it happened
from time to time so sometimes i suspect that one job or hive query that
writes alot of blocks and files that may cause the issue.

Do you think i should examine this again? should i check the content of the
file itself? do you think if migrate the JN role from the vm to a stronger
node with 12 vcores can solve the issue?

avatar
Champion
I do think that you need to move the JN to the same/similar hardware to what you have the others on.

You don't need to check the contents or the files itself. Since it is happening every few seconds it is just lagging behind and then catching up. So if you want to run any real loads on the cluster it needs to be moved to better hardware.

avatar
Master Collaborator

Is it familair to add JN on DataNode/NodeManager server?

 

In my cluster, the 2NNs are physical, the CM and the application server that hosts mysql and oozie are VMs servers, all other DataNodes are physical ones.

avatar
Champion
No, typically Worker nodes are just the process that do the work, Datanode, Impala daemon, NodeManager.

In theory you could and have it on the OS disk (not on any HDFS disks) but you will eventually run into contention between the OS, logs, and the edits. But if you have a small cluster.

My minimum, for a production cluster and/or HA, is three large, physical servers for the Master.

The DBs (although I prefer to have the HMS DB on the Master nodes as well), gateway roles, CM can all be on VMs.

Where is your third ZK instance? As that one will also have IO contention issues on a VM or on a Datanode.

avatar
Master Collaborator

My 3rd ZK was on the same VM but after i got to this issue i moved the ZK to  another OpenStack servers and moved the spark history server one of the NNs to to reduce the load from the VM and increased the Vcores for the Vm to 6 cores but still have the same issue.

avatar
Master Collaborator

The intersting thing that i noticed when this happened at the same time some jobs that runs once a day write to HDFS relatively too much data and it's run with a good number of reducers betweeb 400-1100, which make me suspect in the blocks that written by these jobs at the same time and the vm is getting some lag, trying to find a way to approve this.

avatar
Champion
That is probably the source in the spike in edits being written to the JNs. You could try to address it so reduce the impact.

avatar
Master Collaborator

Do you think looking at the edit logs size when this occur should be a good indication?