Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS NameNode roles failing to start after host restarts

avatar
Explorer

Hello All,

I'm trouble-shooting the following issue with our Cloudera Nutch cluster and would appreciate any help the community can offer:

We have two NameNode roles and three JournalNode roles running, however both NameNode roles are failing to start and reporting the error below (IP addresses obfuscated).  This occurred following a restart of the underlying hosts.
Any recommendations for a recovery path from this error would be greatly appreciated.

 

 

Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [x.x.x.95:8485, x.x.x.86:8485, x.x.x.130:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 1 successful responses:
x.x.x.130:8485: null [success]
2 exceptions thrown:
10.103.28.95:8485: tried to access method com.google.common.collect.Range.<init>(Lcom/google/common/collect/Cut;Lcom/google/common/collect/Cut;)V from class com.google.common.collect.Ranges
	at com.google.common.collect.Ranges.create(Ranges.java:76)
	at com.google.common.collect.Ranges.closed(Ranges.java:98)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.txnRange(Journal.java:872)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:806)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:206)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:261)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

10.103.28.86:8485: tried to access method com.google.common.collect.Range.<init>(Lcom/google/common/collect/Cut;Lcom/google/common/collect/Cut;)V from class com.google.common.collect.Ranges
	at com.google.common.collect.Ranges.create(Ranges.java:76)
	at com.google.common.collect.Ranges.closed(Ranges.java:98)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.txnRange(Journal.java:872)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:806)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:206)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:261)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnclosedSegment(QuorumJournalManager.java:345)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:455)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1408)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1201)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1717)
	at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
	at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1590)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1351)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
	at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

 

 

 

3 REPLIES 3

avatar
Community Manager

@idodds Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our HDFS experts @blizano and @pajoshi  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Rising Star

Hello @idodds ,

 

Your Namenode is failing to connect to quorum of JN (2/3).

Could you check and share any errors/warn you are getting on the two remote JN hosts ?

 

Thank you 

Parth Joshi

avatar
New Contributor

Hi. Thank you for responding. Replying on behalf of @idodds. Both of the nodes report same/similar errors as below:

Jul 21, 8:33:30.310 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Updating lastPromisedEpoch from 172 to 173 for client /x.y.z.30
Jul 21, 8:33:30.312 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Scanning storage FileJournalManager(root=/dfs/journal-edits/nutch-nameservice1)
Jul 21, 8:33:30.329 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Latest log is EditLogFile(file=/dfs/journal-edits/nutch-nameservice1/current/edits_inprogress_0000000000256541217,first=0000000000256541217,last=0000000000256541842,inProgress=true,hasCorruptHeader=false)
Jul 21, 8:33:30.339 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
getSegmentInfo(256541217): EditLogFile(file=/dfs/journal-edits/nutch-nameservice1/current/edits_inprogress_0000000000256541217,first=0000000000256541217,last=0000000000256541842,inProgress=true,hasCorruptHeader=false) -> startTxId: 256541217 endTxId: 256541842 isInProgress: true
Jul 21, 8:33:30.340 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Prepared recovery for segment 256541217: segmentState { startTxId: 256541217 endTxId: 256541842 isInProgress: true } lastWriterEpoch: 38 lastCommittedTxId: 256541843
Jul 21, 8:33:30.358 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
getSegmentInfo(256541217): EditLogFile(file=/dfs/journal-edits/nutch-nameservice1/current/edits_inprogress_0000000000256541217,first=0000000000256541217,last=0000000000256541842,inProgress=true,hasCorruptHeader=false) -> startTxId: 256541217 endTxId: 256541842 isInProgress: true
Jul 21, 8:33:30.358 AM	INFO	org.apache.hadoop.hdfs.qjournal.server.Journal	
Synchronizing log startTxId: 256541217 endTxId: 256541843 isInProgress: true: old segment startTxId: 256541217 endTxId: 256541842 isInProgress: true is not the right length
Jul 21, 8:33:30.358 AM	WARN	org.apache.hadoop.ipc.Server	
IPC Server handler 1 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.acceptRecovery from x.y.z.30:37022 Call#17 Retry#0
java.lang.IllegalAccessError: tried to access method com.google.common.collect.Range.<init>(Lcom/google/common/collect/Cut;Lcom/google/common/collect/Cut;)V from class com.google.common.collect.Ranges
	at com.google.common.collect.Ranges.create(Ranges.java:76)
	at com.google.common.collect.Ranges.closed(Ranges.java:98)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.txnRange(Journal.java:872)
	at org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:806)
	at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:206)
	at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:261)
	at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)