Created 06-13-2019 06:42 PM
hi all
we have ambari cluster - HDP version - 2.6.4
on one of the master machine we cant start the name node
and we notice that
ls /hadoop/hdfs/journal/hdfsha/current/ | grep edits_inprogress edits_inprogress_0000000000018783114.empty
we not have the edits_inprogress_xxxxxx file
what we have is only the file - edits_inprogress_0000000000018783114.empty
any idea how to recover the - edits_inprogress_xxxxxx file
2019-06-13 19:45:42,473 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436) at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1521) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1951) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1807) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1656) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) 2019-06-13 19:45:42,476 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
Created 06-13-2019 09:38 PM
Yes its possible to recover from this situation, which happens sometimes in a Namenode HA setup. Journal nodes are distributed system to store edits. Active Namenode as a client writes edits to journal nodes and commit only when it's replicated to all the journal nodes in a distributed system. Standby NN needs to read data from edits to be in sync with Active one. It can read from any of the replica stored on journal nodes.
ZKFC will make sure that only one Namenode should be active at a time. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, we should configure fencing methods even when using the Quorum Journal Manager.
To work with a fencing journal manager uses epoc numbers. Epoc numbers are an integer which always gets increased and have unique value once assigned. Namenode generates epoc number using a simple algorithm and uses it while sending RPC requests to the QJM. When you configure Namenode HA, the first Active Namenode will get epoc value 1. In case of failover or restart, epoc number will get increased. The Namenode with higher epoc number is considered as newer than any Namenode with an earlier epoc number.
Now let's proceed with the real case, note the hostname of the healthy namenode
You will need to proceed as follows assuming you are logged on as root here is How do I fix one corrupted JN's edits?
# su - hdfs
1) Put both NN in safemode ( NN HA)
$ hdfs dfsadmin -safemode enter
Sample output
Safe mode is ON in namenode1/xxx.xxx.xx.xx:8020 Safe mode is ON in namenode2/xxx.xxx.xx.xx:8020
2) Save Namespace
$ hdfs dfsadmin -saveNamespace
3) On the non-working name node change directory to /hadoop/hdfs/journal/hdfsha/current/* Get the epoch and note the number it should be lower than the in the working name node cross check
$ cat last-promised-epoch
4) On the non-working name node backup all the files in journal dir /hadoop/hdfs/journal/hdfsha/current/* they should look like below
-rw-r--r-- 1 hdfs hadoop 1019566 Jun 10 09:45 edits_0000000000000928232-0000000000000935461 -rw-r--r-- 1 hdfs hadoop 1014516 Jun 10 15:45 edits_0000000000000935462-0000000000000942657 -rw-r--r-- 1 hdfs hadoop 1017540 Jun 10 21:46 edits_0000000000000942658-0000000000000949874 -rw-r--r-- 1 hdfs hadoop 1048576 Jun 10 23:36 edits_0000000000000949875-0000000000000952088 -rw-r--r-- 1 hdfs hadoop 1048576 Jun 13 22:27 edits_inprogress_0000000000000952089 -rw-r--r-- 1 hdfs hadoop 277083 Jun 10 21:46 fsimage_0000000000000949874 -rw-r--r-- 1 hdfs hadoop 62 Jun 10 21:46 fsimage_0000000000000949874.md5 -rw-r--r-- 1 hdfs hadoop 276740 Jun 13 22:13 fsimage_0000000000000952088 -rw-r--r-- 1 hdfs hadoop 62 Jun 13 22:13 fsimage_0000000000000952088.md5 -rw-r--r-- 1 hdfs hadoop 7 Jun 13 22:13 seen_txid -rw-r--r-- 1 hdfs hadoop 206 Jun 13 22:13 VERSION
5) While in the current directory backup all the files note the (.) indicating current dir
$ tar -zcvf editsbck.tar.gz .
6) Move the editsbck.tar.gz to a safe location
$ scp editsbck.tar.gz /home/bronson
7) Backup or move any directory therein eg
$ mv paxos paxos.bck
😎 Delete all files in the /hadoop/hdfs/journal/hdfsha/current/ on the bad node remember you have a backup editsbck.tar.gz
$ rm -rf /hadoop/hdfs/journal/hdfsha/current/*
9) zip or tar the journal dir from a working JN node /hadoop/hdfs/journal/hdfsha/current/*
$ tar -zcvf good_editsbck.tar.gz
10) Copy and unzip/untar the good_editsbck.tar.gz to the non-working JN node to same path as the working namenode /hadoop/hdfs/journal/hdfsha/current/
# scp good_editsbck.tar.gz root@namenode2:/hadoop/hdfs/journal/hdfsha/current/
11) Unzip the files
# tar xvzf something.tar.gz -C /hadoop/hdfs/journal/hdfsha/current/
12) Chown ownership to hdfs the -R recursive in case you have directories
# chown -R hdfs:hadoop /hadoop/hdfs/journal/hdfsha/current/*
Log on the unhealthy name node
13) Restarting the journal nodes
Start all 3 journal nodes note I run the command like root if the were running stop you will see
journal node running as process xxxx. Stop it first.
14) Stopping journal node
# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode"
15) Starting journal node
# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode"
Restart HDFS from Ambari UI
After some minutes the alerts should go and you should see a healthy Active & standby Namenodes. All should be fine now, the NameNode failover should now occur transparently and the below alerts should gradually disappear
HTH
Created 06-13-2019 09:38 PM
Yes its possible to recover from this situation, which happens sometimes in a Namenode HA setup. Journal nodes are distributed system to store edits. Active Namenode as a client writes edits to journal nodes and commit only when it's replicated to all the journal nodes in a distributed system. Standby NN needs to read data from edits to be in sync with Active one. It can read from any of the replica stored on journal nodes.
ZKFC will make sure that only one Namenode should be active at a time. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, we should configure fencing methods even when using the Quorum Journal Manager.
To work with a fencing journal manager uses epoc numbers. Epoc numbers are an integer which always gets increased and have unique value once assigned. Namenode generates epoc number using a simple algorithm and uses it while sending RPC requests to the QJM. When you configure Namenode HA, the first Active Namenode will get epoc value 1. In case of failover or restart, epoc number will get increased. The Namenode with higher epoc number is considered as newer than any Namenode with an earlier epoc number.
Now let's proceed with the real case, note the hostname of the healthy namenode
You will need to proceed as follows assuming you are logged on as root here is How do I fix one corrupted JN's edits?
# su - hdfs
1) Put both NN in safemode ( NN HA)
$ hdfs dfsadmin -safemode enter
Sample output
Safe mode is ON in namenode1/xxx.xxx.xx.xx:8020 Safe mode is ON in namenode2/xxx.xxx.xx.xx:8020
2) Save Namespace
$ hdfs dfsadmin -saveNamespace
3) On the non-working name node change directory to /hadoop/hdfs/journal/hdfsha/current/* Get the epoch and note the number it should be lower than the in the working name node cross check
$ cat last-promised-epoch
4) On the non-working name node backup all the files in journal dir /hadoop/hdfs/journal/hdfsha/current/* they should look like below
-rw-r--r-- 1 hdfs hadoop 1019566 Jun 10 09:45 edits_0000000000000928232-0000000000000935461 -rw-r--r-- 1 hdfs hadoop 1014516 Jun 10 15:45 edits_0000000000000935462-0000000000000942657 -rw-r--r-- 1 hdfs hadoop 1017540 Jun 10 21:46 edits_0000000000000942658-0000000000000949874 -rw-r--r-- 1 hdfs hadoop 1048576 Jun 10 23:36 edits_0000000000000949875-0000000000000952088 -rw-r--r-- 1 hdfs hadoop 1048576 Jun 13 22:27 edits_inprogress_0000000000000952089 -rw-r--r-- 1 hdfs hadoop 277083 Jun 10 21:46 fsimage_0000000000000949874 -rw-r--r-- 1 hdfs hadoop 62 Jun 10 21:46 fsimage_0000000000000949874.md5 -rw-r--r-- 1 hdfs hadoop 276740 Jun 13 22:13 fsimage_0000000000000952088 -rw-r--r-- 1 hdfs hadoop 62 Jun 13 22:13 fsimage_0000000000000952088.md5 -rw-r--r-- 1 hdfs hadoop 7 Jun 13 22:13 seen_txid -rw-r--r-- 1 hdfs hadoop 206 Jun 13 22:13 VERSION
5) While in the current directory backup all the files note the (.) indicating current dir
$ tar -zcvf editsbck.tar.gz .
6) Move the editsbck.tar.gz to a safe location
$ scp editsbck.tar.gz /home/bronson
7) Backup or move any directory therein eg
$ mv paxos paxos.bck
😎 Delete all files in the /hadoop/hdfs/journal/hdfsha/current/ on the bad node remember you have a backup editsbck.tar.gz
$ rm -rf /hadoop/hdfs/journal/hdfsha/current/*
9) zip or tar the journal dir from a working JN node /hadoop/hdfs/journal/hdfsha/current/*
$ tar -zcvf good_editsbck.tar.gz
10) Copy and unzip/untar the good_editsbck.tar.gz to the non-working JN node to same path as the working namenode /hadoop/hdfs/journal/hdfsha/current/
# scp good_editsbck.tar.gz root@namenode2:/hadoop/hdfs/journal/hdfsha/current/
11) Unzip the files
# tar xvzf something.tar.gz -C /hadoop/hdfs/journal/hdfsha/current/
12) Chown ownership to hdfs the -R recursive in case you have directories
# chown -R hdfs:hadoop /hadoop/hdfs/journal/hdfsha/current/*
Log on the unhealthy name node
13) Restarting the journal nodes
Start all 3 journal nodes note I run the command like root if the were running stop you will see
journal node running as process xxxx. Stop it first.
14) Stopping journal node
# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode"
15) Starting journal node
# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode"
Restart HDFS from Ambari UI
After some minutes the alerts should go and you should see a healthy Active & standby Namenodes. All should be fine now, the NameNode failover should now occur transparently and the below alerts should gradually disappear
HTH
Created 06-13-2019 10:05 PM
@Geoffrey Shelton Okot thank you so much for your effort and for your time , tomorrow I will do your steps , in spite I still not do the tests , I will choose your answer as accept , and if I have comments then I will share them , again many many thanks
Created 06-14-2019 06:06 AM
@Geoffrey Shelton Okot - about cat last-promised-epoch , I have the number 31 in non working node , do you mean that I need to decrease it to 30?
Created 06-14-2019 06:34 AM
Can you confirm all the other 2 journal nodes have the last-promised-epoch of 30? That when the failure occurred, it's okay to replace the contents of the /hadoop/hdfs/journal/hdfsha/current/* with the contents of the good(active) namenode.
Then proceed with the subsequent steps
Created 06-14-2019 06:43 AM
on the good name node we have the number 31
on the bad name node we have also the number 31
on the other journal node we have the number 28
Created 06-14-2019 06:44 AM
so according to this info do you recommended to set the value as is ( 31 ) or other ?
Created 06-14-2019 07:57 AM
Get the journal node that is healthy (active namenode) aftter saving the Namespace you also wipe out the other journal node which had edits_inprogress_0000000000018783114.empty remember to backup/zip all the journalnodes as good practice
Once you have copied the good to all the 3 destinations proceed and when you start the namenode after staring the journalnode one should become active and the other standby thanks to ZKFailover.
Created 06-14-2019 07:30 PM
Is all good?
Created 06-16-2019 04:02 AM
@Geoffrey Shelton Okot no both namenode started as standby and then the namenode stooped