Support Questions

Find answers, ask questions, and share your expertise

Failed to start namenode. java.io.IOException: Timed out waiting for getJournalCTime() response

avatar
New Contributor

Hi,

 

I'm upgrading CDH from 5.13.0 to 6.3.1 and can not proceed after "Upgrade HDFS Metadata".

 

On a 3rd step, after "Starting the JournalNodes." and "Starting metadata upgrade on Active NameNode of nameservice nameservice1.". I cannot proceed with "Waiting for NameNode (master02) to start responding to RPCs.".

 

It just freezes, but according to log records, it will not continue because the process failed. And it is failed because of a quorum of JournalNodes is not succeeded. master03.ib (10.12.0.3) is not responding.

 

What can I do? What can cause the issue? Can I run following steps manually?

 

The logs says the following

Spoiler

6:43:03.610 PM WARN QuorumJournalManager

Waited 55044 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]

6:43:04.611 PM WARN QuorumJournalManager

Waited 56045 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]

6:43:05.611 PM WARN QuorumJournalManager

Waited 57046 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]

6:43:06.613 PM WARN QuorumJournalManager

Waited 58047 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]

6:43:07.614 PM WARN QuorumJournalManager

Waited 59048 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]

6:43:08.673 PM INFO FSNamesystem

FSNamesystem write lock held for 60244 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1604)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1111)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:950)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:929)
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 60244

6:43:08.675 PM WARN FSNamesystem

Encountered exception loading fsimage
java.io.IOException: Timed out waiting for getJournalCTime() response
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:678)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1613)
at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:683)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:443)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:310)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:950)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:929)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)

6:43:08.698 PM INFO ContextHandler

Stopped o.e.j.w.WebAppContext@52045dbe{/,null,UNAVAILABLE}{/hdfs}

6:43:08.704 PM INFO AbstractConnector

Stopped ServerConnector@34997338{HTTP/1.1,[http/1.1]}{master02.ib:9870}

6:43:08.705 PM INFO ContextHandler

Stopped o.e.j.s.ServletContextHandler@4d722ac9{/static,file:///opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop-hdfs/webapps/static/,UNAVAILABLE}

6:43:08.705 PM INFO ContextHandler

Stopped o.e.j.s.ServletContextHandler@2320fa6f{/logs,file:///var/log/hadoop-hdfs/,UNAVAILABLE}

6:43:08.709 PM INFO MetricsSystemImpl

Stopping NameNode metrics system...

6:43:08.710 PM INFO MetricsSystemImpl

NameNode metrics system stopped.

6:43:08.710 PM INFO MetricsSystemImpl

NameNode metrics system shutdown complete.

6:43:08.711 PM ERROR NameNode

Failed to start namenode.
java.io.IOException: Timed out waiting for getJournalCTime() response
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:678)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1613)
at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:683)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:443)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:310)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:950)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:929)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)

6:43:08.714 PM INFO ExitUtil

Exiting with status 1: java.io.IOException: Timed out waiting for getJournalCTime() response

6:43:08.717 PM INFO NameNode

SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master02.ib/10.12.0.2
************************************************************/

Thanks and best regards,

Oleh

1 ACCEPTED SOLUTION

avatar
New Contributor

Solved by copying /dfs/jn folder from master01.ib (one nodes in sync) to master03.ib.

View solution in original post

1 REPLY 1

avatar
New Contributor

Solved by copying /dfs/jn folder from master01.ib (one nodes in sync) to master03.ib.