Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Active Namenode is Down in Hadoop

Active Namenode is Down in Hadoop

New Contributor

In my hadoop cluster i have

1 active name node

1 standby name node

3 journal nodes

4 data nodes

Upto my analysis, Active Namenode in down, because it can't able to write editlogs to majority of journal node. standby name node did't take over after Active Namenode failure, because password less access between in Namenodes was not enabled.

Logs in Active Namenode

2018-07-22 00:49:05,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:07,490 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:08,491 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:08,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:09,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:10,493 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:11,495 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:11,506 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:12,495 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:13,496 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:14,498 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:14,512 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:15,498 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:16,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:17,500 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:17,518 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:18,502 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:19,503 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:20,504 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [JOURNALNODE_IP:8485]

2018-07-22 00:49:20,524 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:49:21,489 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [JOURNALNODE_IP:8485, JOURNALNODE_IP:8485, JOURNALNODE_IP:8485], stream=QuorumOutputStream starting at txid 203478))

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1266)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203)

at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1300)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5836)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1122)

at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)

at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

2018-07-22 00:49:21,491 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 203478

2018-07-22 00:49:21,494 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

2018-07-22 00:49:21,496 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

/

SHUTDOWN_MSG: Shutting down NameNode at ACTIVE_NAMENODE_IP/ACTIVE_NAMENODE_IP

/

Logs in Standby Namenode

2018-07-22 00:43:51,605 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:53,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:53,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:54,609 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:55,336 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:56,336 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:56,347 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:56,347 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:57,338 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:57,615 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:58,339 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:59,340 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:43:59,353 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:43:59,353 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:00,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:00,621 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:01,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:02,343 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:01,342 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:02,343 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:02,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:02,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:03,345 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:03,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:04,345 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:05,347 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16012 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:05,365 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:05,365 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:06,347 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17013 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:06,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:07,348 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18014 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:08,350 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19015 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

2018-07-22 00:44:08,371 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:08,371 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: JOURNALNODE_IP/JOURNALNODE_IP:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2018-07-22 00:44:09,336 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [JOURNALNODE_IP:8485, JOURNALNODE_IP:8485, JOURNALNODE_IP:8485]. Skipping.

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1508)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1532)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:214)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)

at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)

at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)

Logs in Journal Node

2018-07-22 02:43:04,209 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client ACTIVE_NAMENODE_IP threw exception [java.io.IOException: Connection reset by peer]

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:197)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)

at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)

at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)

at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)

at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)

at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)

2018-07-22 02:43:04,212 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from client STANBY_NAMENODE_IP threw exception [java.io.IOException: Connection reset by peer]

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:197)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)

at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)

at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)

at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)

at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)

at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)

Note

I have changed ip address of the namenode(active), namenode(standby) and journalnode to ACTIVE_NAMENODE_IP, STANDBY_NAMENODE_IP and JOURNALNODE_IP respectively in logs.