Created on 12-23-2016 05:25 PM
SYMPTOM: Ambari is showing Alert about a connection failed to the journal node service. Below is the alert -
2016-06-30 18:50:39,865 [CRITICAL] [HDFS] [journalnode_process] (JournalNode Process) Connection failed to http://jn1.example.com:8480 (Execution of 'curl -k --negotiate -u : -b /var/lib/ambari-agent/tmp/cookies/f8ed47d4-f63e-482c-be70-36755387ca4b -c /var/lib/ambari-agent/tmp/cookies/f8ed47d4-f63e-482c-be70-36755387ca4b -w '%{http_code}' http://jn.example.com:8480 --connect-timeout 5 --max-time 7 -o /dev/null 1>/tmp/tmpE9v3mg 2>/tmp/tmpKOSncN' returned 28. % Total % Received % Xferd Average Speed Time Time Time Current
ERROR: Below are the journal logs
2016-07-01 10:21:29,390 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/phadcluster01/current/edits_inprogress_0000000002510372012 while determining its valid length. Position was 712704 java.io.IOException: Can't scan a pre-transactional edit log. at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4959) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:346) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:520) at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192) at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.startLogSegment(JournalNodeRpcServer.java:161) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.startLogSegment(QJournalProtocolServerSideTranslatorPB.java:186) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25425) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
ROOT CAUSE: From the log below it seems that the journal node edits were corrupted
2016-07-01 10:21:16,007 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/phadcluster01/current/edits_inprogress_0000000002510372012 while determining its valid length. Position was 712704 java.io.IOException: Can't scan a pre-transactional edit log.
RESOLUTION: Below are steps taken to resolve the issue -
1.stopped journal node 2.backup existing jn directory metadata 3.copied working edits_inprogress from other JN node 4.Modified the permission to hdfs:hadoop 5.Restart the Journal node. 6.JN started successfully and no more errors are seen in the log.
User | Count |
---|---|
758 | |
379 | |
316 | |
309 | |
268 |