Support Questions
Find answers, ask questions, and share your expertise

Standby Namenode is going down regularly

Standby Namenode is going down regularly

Contributor

Hi everyone,

i have 6 node cluster and my standby namenode is going down continuously but when i start it is coming up with out any issue

i need to fix it permanently can you please help

Please find the log below

2018-07-01 22:44:01,939 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2018-07-01 22:44:01,948 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 11 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from IP:8258 Call#4620178 Retry#0: output error

2018-07-01 22:44:01,949 INFO ipc.Server (Server.java:run(2402)) - IPC Server handler 11 on 8020 caught an exception

java.nio.channels.ClosedChannelException

at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)

at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2909)

at org.apache.hadoop.ipc.Server.access$2100(Server.java:138)

at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1223)

at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1295)

at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2266)

at org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1375)

at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:734)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2391)

2018-07-01 22:44:01,948 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for hbase/server4.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2018-07-01 22:44:01,963 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for hbase/server5.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2018-07-01 22:44:01,993 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(771)) - Number of transactions: 43 Total time for transactions(ms): 22 Number of transactions batched in Syncs: 0 Number of syncs: 42 SyncTimes(ms): 907 357

2018-07-01 22:44:02,144 WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal IP:8485 failed to write txns 157817808-157817808. Will try to write to this JN again after the next log roll.

org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)

at org.apache.hadoop.ipc.Client.call(Client.java:1498)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy11.journal(Unknown Source)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-07-01 22:44:02,169 WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal IP1:8485 failed to write txns 157817808-157817808. Will try to write to this JN again after the next log roll.

org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)

at org.apache.hadoop.ipc.Client.call(Client.java:1498)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy11.journal(Unknown Source)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-07-01 22:44:02,177 WARN client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal IP2:8485 failed to write txns 157817808-157817808. Will try to write to this JN again after the next log roll.

org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)

at org.apache.hadoop.ipc.Client.call(Client.java:1498)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy11.journal(Unknown Source)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)

at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-07-01 22:44:02,182 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [IP1:8485, IP2:8485, IP:8485], stream=QuorumOutputStream starting at txid 157817766))

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:

IP2:8485: IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

IP:8485: IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

IP1:8485: IPC's epoch 518 is less than the last promised epoch 519

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)

at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)

at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)

at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)

at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)

at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:707)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:641)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2691)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2556)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:408)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

2018-07-01 22:44:02,182 WARN client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 157817766

2018-07-01 22:44:02,199 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

2018-07-01 22:44:02,239 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(516)) - ==> JVMShutdownHook.run()

2018-07-01 22:44:02,239 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(517)) - JVMShutdownHook: Signalling async audit cleanup to start.

2018-07-01 22:44:02,239 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(521)) - JVMShutdownHook: Waiting up to 30 seconds for audit cleanup to finish.

2018-07-01 22:44:02,245 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(492)) - RangerAsyncAuditCleanup: Starting cleanup

2018-07-01 22:44:02,251 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(310)) - Audit Status Log: name=hdfs.async.multi_dest.batch.hdfs, interval=03:01.906 minutes, events=114, succcessCount=114, totalEvents=3188810, totalSuccessCount=3188810

2018-07-01 22:44:02,251 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:30

2018-07-01 22:44:02,252 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(347)) - Exiting consumerThread. Queue=hdfs.async.multi_dest.batch, dest=hdfs.async.multi_dest.batch.hdfs

2018-07-01 22:44:02,252 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(351)) - Calling to stop consumer. name=hdfs.async.multi_dest.batch, consumer.name=hdfs.async.multi_dest.batch.hdfs

2018-07-01 22:44:03,967 INFO BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(395)) - chooseUnderReplicatedBlocks selected 12 blocks at priority level 2; Total=12 Reset bookmarks? false

2018-07-01 22:44:03,967 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1647)) - BLOCK* neededReplications = 3922, pendingReplications = 0.

2018-07-01 22:44:03,967 INFO blockmanagement.BlockManager (BlockManager.java:computeReplicationWorkForBlocks(1654)) - Blocks chosen but could not be replicated = 12; of which 12 have no target, 0 have no source, 0 are UC, 0 are abandoned, 0 already have enough replicas.

2018-07-01 22:44:04,580 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS)

2018-07-01 22:44:04,609 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.ha.HAServiceProtocol

2018-07-01 22:44:04,797 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS)

2018-07-01 22:44:04,817 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for nn/server2.covert.com@COVERTHADOOP.NET (auth:KERBEROS) for protocol=interface org.apache.hadoop.ha.HAServiceProtocol

2018-07-01 22:44:04,826 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1272)) - Stopping services started for active state

2018-07-01 22:44:04,826 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(659)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

2018-07-01 22:44:04,832 INFO namenode.FSNamesystem (FSNamesystem.java:run(5115)) - LazyPersistFileScrubber was interrupted, exiting

2018-07-01 22:44:04,843 INFO namenode.FSNamesystem (FSNamesystem.java:run(5029)) - NameNodeEditLogRoller was interrupted, exiting

2018-07-01 22:44:08,757 ERROR impl.CloudSolrClient (CloudSolrClient.java:requestWithRetryOnStaleState(903)) - Request to collection ranger_audits failed due to (403) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://server2.covert.com:8983/solr/ranger_audits_shard1_replica1: Expected mime type application/octet-stream but got text/html. <html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

<title>Error 403 GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))</title>

</head>

<body><h2>HTTP ERROR 403</h2>

<p>Problem accessing /solr/ranger_audits_shard1_replica1/update. Reason:

<pre> GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>

4 REPLIES 4

Re: Standby Namenode is going down regularly

Mentor

@kanna k

You have a corrupt journalnode, Please follow this HCC doc to resolve the issue.

Assuming that this is happening on a single JournalNode then you can try the following:

  1. As a precaution, stop HDFS. This will shut down all Journalnodes as well.
  2. On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/xxxxx/current) to an alternate location.
  3. Copy the fsimage edits directory (/hadoop/hdfs/journal/xxxxx/current) from a functioning JournalNode to this node.
  4. Start HDFS.

HTH

Re: Standby Namenode is going down regularly

Contributor

In my cluster the two namenodes going down.

assume that server1 is having active namenode and server 2 is having standby name node.

sometimes active namenode is going down and standby name node is taking the charge as a active.

some times standby namenode is going down

how to find the corrupted journal node and from where i need to get the journal node data(fsimage,editlogs) and where i need to paste the data

Re: Standby Namenode is going down regularly

Mentor

@kanna k

If the Active /standby namenodes switch status unexpectedly then have a look at the NTPD setup ! Are your node in sync? Kerberos is very sensitive with time so ensure your cluster time is in sync.

Re: Standby Namenode is going down regularly

New Contributor

Hello, I have recently encountered a similar problem, it happens when I use hive to insert data into my table, then my cluster is hdp2.7.2, it is a newly built cluster, then when I check my namenode log I find this problem, but my active and standby are normal, there is no problem at all

 

2020-05-13 09:33:15,484 INFO ipc.Server (Server.java:run(2402)) - IPC Server handler 746 on 8020 caught an exception
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2910)
at org.apache.hadoop.ipc.Server.access$2100(Server.java:138)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1223)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1295)
at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2266)
at org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1375)
at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:734)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2391)
2020-05-13 09:33:15,484 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 28 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 10.100.1.9:48350 Call#3590 Retry#0: output error
2020-05-13 09:33:15,485 INFO ipc.Server (Server.java:run(2402)) - IPC Server handler 28 on 8020 caught an exception
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2910)
at org.apache.hadoop.ipc.Server.access$2100(Server.java:138)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1223)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1295)
at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2266)
at org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1375)
at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:734)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2391)
2020-05-13 09:33:15,484 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 219 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.100.1.12:52988 Call#3987 Retry#0: output error
2020-05-13 09:33:15,484 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 176 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.100.1.9:54838 Call#71 Retry#0: output error
2020-05-13 09:33:15,485 INFO ipc.Server (Server.java:run(2402)) - IPC Server handler 176 on 8020 caught an exception
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2910)
at org.apache.hadoop.ipc.Server.access$2100(Server.java:138)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1223)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1295)
at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2266)
at org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1375)
at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:734)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2391)
2020-05-13 09:33:15,485 WARN ipc.Server (Server.java:processResponse(1273)) - IPC Server handler 500 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 10.100.1.12:52988 Call#3988 Retry#0: output error
2020-05-13 09:33:15,485 INFO ipc.Server (Server.java:run(2402)) - IPC Server handler 219 on 8020 caught an exception