Support Questions

Find answers, ask questions, and share your expertise

HDFS NameNode roles failing to start after host restarts

avatar
Explorer

Hello All,

I'm trouble-shooting the following issue with our Cloudera Nutch cluster and would appreciate any help the community can offer:

We have two NameNode roles and three JournalNode roles running, however both NameNode roles are failing to start and reporting the error below (IP addresses obfuscated).  This occurred following a restart of the underlying hosts.
Any recommendations for a recovery path from this error would be greatly appreciated.

 

 

Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [x.x.x.95:8485, x.x.x.86:8485, x.x.x.130:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 1 successful responses:
x.x.x.130:8485: null [success]
2 exceptions thrown:
10.103.28.95:8485: tried to access method com.google.common.collect.Range.<init>(Lcom/google/common/collect/Cut;Lcom/google/common/collect/Cut;)V from class com.google.common.collect.Ranges
	at com.google.common.collect.Ranges.create(Ranges.java:76)
	at com.google.common.collect.Ranges.closed(Ranges.java:98)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.txnRange(Journal.java:872)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:806)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:206)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:261)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

10.103.28.86:8485: tried to access method com.google.common.collect.Range.<init>(Lcom/google/common/collect/Cut;Lcom/google/common/collect/Cut;)V from class com.google.common.collect.Ranges
	at com.google.common.collect.Ranges.create(Ranges.java:76)
	at com.google.common.collect.Ranges.closed(Ranges.java:98)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.txnRange(Journal.java:872)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:806)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:206)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:261)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnclosedSegment(QuorumJournalManager.java:345)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:455)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1201)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1717)
	at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
	at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1590)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1351)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
	at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

 

 

 

3 REPLIES 3

avatar
Community Manager

@idodds Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our HDFS experts @blizano and @pajoshi  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Rising Star

Hello @idodds ,

 

Your Namenode is failing to connect to quorum of JN (2/3).

Could you check and share any errors/warn you are getting on the two remote JN hosts ?

 

Thank you 

Parth Joshi

avatar
New Contributor

Hi. Thank you for responding. Replying on behalf of @idodds. Both of the nodes report same/similar errors as below:

Jul 21, 8:33:30.310 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Updating lastPromisedEpoch from 172 to 173 for client /x.y.z.30
Jul 21, 8:33:30.312 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Scanning storage FileJournalManager(root=/dfs/journal-edits/nutch-nameservice1)
Jul 21, 8:33:30.329 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Latest log is EditLogFile(file=/dfs/journal-edits/nutch-nameservice1/current/edits_inprogress_0000000000256541217,first=0000000000256541217,last=0000000000256541842,inProgress=true,hasCorruptHeader=false)
Jul 21, 8:33:30.339 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
getSegmentInfo(256541217): EditLogFile(file=/dfs/journal-edits/nutch-nameservice1/current/edits_inprogress_0000000000256541217,first=0000000000256541217,last=0000000000256541842,inProgress=true,hasCorruptHeader=false) -> startTxId: 256541217 endTxId: 256541842 isInProgress: true
Jul 21, 8:33:30.340 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Prepared recovery for segment 256541217: segmentState { startTxId: 256541217 endTxId: 256541842 isInProgress: true } lastWriterEpoch: 38 lastCommittedTxId: 256541843
Jul 21, 8:33:30.358 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
getSegmentInfo(256541217): EditLogFile(file=/dfs/journal-edits/nutch-nameservice1/current/edits_inprogress_0000000000256541217,first=0000000000256541217,last=0000000000256541842,inProgress=true,hasCorruptHeader=false) -> startTxId: 256541217 endTxId: 256541842 isInProgress: true
Jul 21, 8:33:30.358 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Synchronizing log startTxId: 256541217 endTxId: 256541843 isInProgress: true: old segment startTxId: 256541217 endTxId: 256541842 isInProgress: true is not the right length
Jul 21, 8:33:30.358 AM	WARN	org.apache.hadoop.ipc.Server	
IPC Server handler 1 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.acceptRecovery from x.y.z.30:37022 Call#17 Retry#0
java.lang.IllegalAccessError: tried to access method com.google.common.collect.Range.<init>(Lcom/google/common/collect/Cut;Lcom/google/common/collect/Cut;)V from class com.google.common.collect.Ranges
	at com.google.common.collect.Ranges.create(Ranges.java:76)
	at com.google.common.collect.Ranges.closed(Ranges.java:98)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.txnRange(Journal.java:872)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:806)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:206)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:261)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)