Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Standby Namenode is going down regularly

Standby Namenode is going down regularly

Contributor

Hi everyone,

i have 6 node cluster and my standby namenode is going down continuously but when i start it is coming up with out any issue

i need to fix it permanently can you please help

Please find the log below

2018-07-01 22:44:01,939 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2018-07-01 22:44:01,948 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 11 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from IP:8258 Call#4620178 Retry#0: output error

2018-07-01 22:44:01,949 INFO ipc.Server (Server.java:run(2402)) - IPC Server handler 11 on 8020 caught an exception

java.nio.channels.ClosedChannelException

at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)

at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2909)

at org.apache.hadoop.ipc.Server.access$2100(Server.java:138)

at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1223)

at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1295)

at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2266)

at org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1375)

at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:734)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2391)

2018-07-01 22:44:01,948 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for hbase/server4.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2018-07-01 22:44:01,963 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for hbase/server5.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2018-07-01 22:44:01,993 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(771)) - Number of transactions: 43 Total time for transactions(ms): 22 Number of transactions batched in Syncs: 0 Number of syncs: 42 SyncTimes(ms): 907 357

2018-07-01 22:44:02,144 WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal IP:8485 failed to write txns 157817808-157817808. Will try to write to this JN again after the next log roll.

org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)

at org.apache.hadoop.ipc.Client.call(Client.java:1498)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy11.journal(Unknown Source)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-07-01 22:44:02,169 WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal IP1:8485 failed to write txns 157817808-157817808. Will try to write to this JN again after the next log roll.

org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)

at org.apache.hadoop.ipc.Client.call(Client.java:1498)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy11.journal(Unknown Source)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-07-01 22:44:02,177 WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal IP2:8485 failed to write txns 157817808-157817808. Will try to write to this JN again after the next log roll.

org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)

at org.apache.hadoop.ipc.Client.call(Client.java:1498)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy11.journal(Unknown Source)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-07-01 22:44:02,182 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [IP1:8485, IP2:8485, IP:8485], stream=QuorumOutputStream starting at txid 157817766))

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:

IP2:8485: IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

IP:8485: IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

IP1:8485: IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)

at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:707)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:641)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2691)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2556)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:408)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

2018-07-01 22:44:02,182 WARN client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 157817766

2018-07-01 22:44:02,199 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

2018-07-01 22:44:02,239 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(516)) - ==> JVMShutdownHook.run()

2018-07-01 22:44:02,239 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(517)) - JVMShutdownHook: Signalling async audit cleanup to start.

2018-07-01 22:44:02,239 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(521)) - JVMShutdownHook: Waiting up to 30 seconds for audit cleanup to finish.

2018-07-01 22:44:02,245 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(492)) - RangerAsyncAuditCleanup: Starting cleanup

2018-07-01 22:44:02,251 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(310)) - Audit Status Log: name=hdfs.async.multi_dest.batch.hdfs, interval=03:01.906 minutes, events=114, succcessCount=114, totalEvents=3188810, totalSuccessCount=3188810

2018-07-01 22:44:02,251 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:30

2018-07-01 22:44:02,252 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(347)) - Exiting consumerThread. Queue=hdfs.async.multi_dest.batch, dest=hdfs.async.multi_dest.batch.hdfs

2018-07-01 22:44:02,252 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(351)) - Calling to stop consumer. name=hdfs.async.multi_dest.batch, consumer.name=hdfs.async.multi_dest.batch.hdfs

2018-07-01 22:44:03,967 INFO BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(395)) - chooseUnderReplicatedBlocks selected 12 blocks at priority level 2; Total=12 Reset bookmarks? false

2018-07-01 22:44:03,967 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1647)) - BLOCK* neededReplications = 3922, pendingReplications = 0.

2018-07-01 22:44:03,967 INFO blockmanagement.BlockManager (BlockManager.java:computeReplicationWorkForBlocks(1654)) - Blocks chosen but could not be replicated = 12; of which 12 have no target, 0 have no source, 0 are UC, 0 are abandoned, 0 already have enough replicas.

2018-07-01 22:44:04,580 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS)

2018-07-01 22:44:04,609 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.ha.HAServiceProtocol

2018-07-01 22:44:04,797 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS)

2018-07-01 22:44:04,817 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.ha.HAServiceProtocol

2018-07-01 22:44:04,826 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1272)) - Stopping services started for active state

2018-07-01 22:44:04,826 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(659)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

2018-07-01 22:44:04,832 INFO namenode.FSNamesystem (FSNamesystem.java:run(5115)) - LazyPersistFileScrubber was interrupted, exiting

2018-07-01 22:44:04,843 INFO namenode.FSNamesystem (FSNamesystem.java:run(5029)) - NameNodeEditLogRoller was interrupted, exiting

2018-07-01 22:44:08,757 ERROR impl.CloudSolrClient (CloudSolrClient.java:requestWithRetryOnStaleState(903)) - Request to collection ranger_audits failed due to (403) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://server2.covert.com:8983/solr/ranger_audits_shard1_replica1: Expected mime type application/octet-stream but got text/html. <html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

<title>Error 403 GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))</title>

</head>

<body><h2>HTTP ERROR 403</h2>

<p>Problem accessing /solr/ranger_audits_shard1_replica1/update. Reason:

<pre> GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>

3 REPLIES 3

Re: Standby Namenode is going down regularly

Mentor

@kanna k

You have a corrupt journalnode, Please follow this HCC doc to resolve the issue.

Assuming that this is happening on a single JournalNode then you can try the following:

  1. As a precaution, stop HDFS. This will shut down all Journalnodes as well.
  2. On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/xxxxx/current) to an alternate location.
  3. Copy the fsimage edits directory (/hadoop/hdfs/journal/xxxxx/current) from a functioning JournalNode to this node.
  4. Start HDFS.

HTH

Re: Standby Namenode is going down regularly

Contributor

In my cluster the two namenodes going down.

assume that server1 is having active namenode and server 2 is having standby name node.

sometimes active namenode is going down and standby name node is taking the charge as a active.

some times standby namenode is going down

how to find the corrupted journal node and from where i need to get the journal node data(fsimage,editlogs) and where i need to paste the data

Re: Standby Namenode is going down regularly

Mentor

@kanna k

If the Active /standby namenodes switch status unexpectedly then have a look at the NTPD setup ! Are your node in sync? Kerberos is very sensitive with time so ensure your cluster time is in sync.