Member since
08-23-2018
15
Posts
0
Kudos Received
0
Solutions
08-29-2018
05:36 AM
@Geoffrey Shelton Okot Thanks for your answer. Checked NTP is working fine. But i need the reason for the name node failure?
... View more
08-27-2018
02:41 PM
All journal nodes have same logs as above given And how to check journal node GC logs
... View more
08-27-2018
10:24 AM
In my hadoop cluster i have 1 active name node 1 standby name node 3 journal nodes 4 data nodes Upto my analysis, Active Namenode in down, because it can't able to write editlogs to majority of journal node. standby name node did't take over after Active Namenode failure, because password less access between in Namenodes was not enabled. Logs in Active Namenode <code>2018-07-22 00:49:05,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:49:07,490 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:08,491 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:08,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:49:09,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:10,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:11,495 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:11,506 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:49:12,495 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:13,496 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:14,498 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:14,512 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:49:15,498 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:16,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:17,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:17,518 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:49:18,502 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:19,503 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:20,504 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]
2018-07-22 00:49:20,524 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:49:21,489 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [JOURNALNODE_IP:8485, JOURNALNODE_IP:8485, JOURNALNODE_IP:8485], stream=QuorumOutputStream starting at txid 203478))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1266)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203)
at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1300)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5836)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1122)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2018-07-22 00:49:21,491 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 203478
2018-07-22 00:49:21,494 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-07-22 00:49:21,496 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NameNode at ACTIVE_NAMENODE_IP/ACTIVE_NAMENODE_IP
/
Logs in Standby Namenode <code>2018-07-22 00:43:51,605 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:53,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:53,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:54,609 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:55,336 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:43:56,336 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:43:56,347 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:56,347 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:57,338 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:43:57,615 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:58,339 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:43:59,340 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:43:59,353 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:43:59,353 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:00,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:00,621 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:01,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:02,343 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:01,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:02,343 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:02,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:02,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:03,345 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:03,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:04,345 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:05,347 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:05,365 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:05,365 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:06,347 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:06,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:07,348 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18014 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:08,350 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2018-07-22 00:44:08,371 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:08,371 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-22 00:44:09,336 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [JOURNALNODE_IP:8485, JOURNALNODE_IP:8485, JOURNALNODE_IP:8485]. Skipping.
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1508)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1532)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:214)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
Logs in Journal Node <code>2018-07-22 02:43:04,209 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client ACTIVE_NAMENODE_IP threw exception [java.io.IOException: Connection reset by peer]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
2018-07-22 02:43:04,212 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client STANBY_NAMENODE_IP threw exception [java.io.IOException: Connection reset by peer]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
NOTE I have changed ip address of the namenode(active), namenode(standby) and journalnode to ACTIVE_NAMENODE_IP, STANDBY_NAMENODE_IP and JOURNALNODE_IP respectively in logs. So what is the reason behind namenode failure?
... View more
Labels:
- Labels:
-
Apache Hadoop
04-16-2018
10:44 AM
The DataNode stores a single ".meta" file corresponding to each block replica or For each block replica hosted by a DataNode, there is a corresponding metadata file which is true?
... View more
04-16-2018
10:19 AM
@Tabrez Basha Syed @Chris Nauroth i have a doubt in .meta file how to open and view .meta file?
... View more
03-27-2018
09:43 AM
Labels:
- Labels:
-
Apache Hadoop