Created on 02-13-2020 09:54 AM - last edited on 02-13-2020 10:00 AM by VidyaSargur
Hi,
I'm upgrading CDH from 5.13.0 to 6.3.1 and can not proceed after "Upgrade HDFS Metadata".
On a 3rd step, after "Starting the JournalNodes." and "Starting metadata upgrade on Active NameNode of nameservice nameservice1.". I cannot proceed with "Waiting for NameNode (master02) to start responding to RPCs.".
It just freezes, but according to log records, it will not continue because the process failed. And it is failed because of a quorum of JournalNodes is not succeeded. master03.ib (10.12.0.3) is not responding.
What can I do? What can cause the issue? Can I run following steps manually?
The logs says the following
6:43:03.610 PM WARN QuorumJournalManager
Waited 55044 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]
6:43:04.611 PM WARN QuorumJournalManager
Waited 56045 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]
6:43:05.611 PM WARN QuorumJournalManager
Waited 57046 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]
6:43:06.613 PM WARN QuorumJournalManager
Waited 58047 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]
6:43:07.614 PM WARN QuorumJournalManager
Waited 59048 ms (timeout=60000 ms) for a response for getJournalCTime. Succeeded so far: [10.12.0.2:8485,10.12.0.1:8485]
6:43:08.673 PM INFO FSNamesystem
FSNamesystem write lock held for 60244 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1604)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1111)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:950)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:929)
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 60244
6:43:08.675 PM WARN FSNamesystem
Encountered exception loading fsimage
java.io.IOException: Timed out waiting for getJournalCTime() response
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:678)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1613)
at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:683)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:443)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:310)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:950)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:929)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
6:43:08.698 PM INFO ContextHandler
Stopped o.e.j.w.WebAppContext@52045dbe{/,null,UNAVAILABLE}{/hdfs}
6:43:08.704 PM INFO AbstractConnector
Stopped ServerConnector@34997338{HTTP/1.1,[http/1.1]}{master02.ib:9870}
6:43:08.705 PM INFO ContextHandler
Stopped o.e.j.s.ServletContextHandler@4d722ac9{/static,file:///opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop-hdfs/webapps/static/,UNAVAILABLE}
6:43:08.705 PM INFO ContextHandler
Stopped o.e.j.s.ServletContextHandler@2320fa6f{/logs,file:///var/log/hadoop-hdfs/,UNAVAILABLE}
6:43:08.709 PM INFO MetricsSystemImpl
Stopping NameNode metrics system...
6:43:08.710 PM INFO MetricsSystemImpl
NameNode metrics system stopped.
6:43:08.710 PM INFO MetricsSystemImpl
NameNode metrics system shutdown complete.
6:43:08.711 PM ERROR NameNode
Failed to start namenode.
java.io.IOException: Timed out waiting for getJournalCTime() response
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:678)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1613)
at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:683)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:443)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:310)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1084)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:709)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:727)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:950)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:929)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1653)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1720)
6:43:08.714 PM INFO ExitUtil
Exiting with status 1: java.io.IOException: Timed out waiting for getJournalCTime() response
6:43:08.717 PM INFO NameNode
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master02.ib/10.12.0.2
************************************************************/
Thanks and best regards,
Oleh
Created on 02-13-2020 12:19 PM - last edited on 02-13-2020 04:23 PM by ask_bill_brooks
Solved by copying /dfs/jn
folder from master01.ib (one nodes in sync) to master03.ib.
Created on 02-13-2020 12:19 PM - last edited on 02-13-2020 04:23 PM by ask_bill_brooks
Solved by copying /dfs/jn
folder from master01.ib (one nodes in sync) to master03.ib.