Support Questions

mike_bronson7 · ‎06-13-2019

hi all

we have ambari cluster - HDP version - 2.6.4

on one of the master machine we cant start the name node

and we notice that

  ls  /hadoop/hdfs/journal/hdfsha/current/ | grep edits_inprogress 
  edits_inprogress_0000000000018783114.empty

we not have the edits_inprogress_xxxxxx file

what we have is only the file - edits_inprogress_0000000000018783114.empty

any idea how to recover the - edits_inprogress_xxxxxx file

2019-06-13 19:45:42,473 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to 
java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:183)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1521)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1951)
        at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
        at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1807)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1656)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
        at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
2019-06-13 19:45:42,476 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

Michael-Bronson

Shelton · ‎06-13-2019

@Michael Bronson

Yes its possible to recover from this situation, which happens sometimes in a Namenode HA setup. Journal nodes are distributed system to store edits. Active Namenode as a client writes edits to journal nodes and commit only when it's replicated to all the journal nodes in a distributed system. Standby NN needs to read data from edits to be in sync with Active one. It can read from any of the replica stored on journal nodes.

ZKFC will make sure that only one Namenode should be active at a time. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, we should configure fencing methods even when using the Quorum Journal Manager.

To work with a fencing journal manager uses epoc numbers. Epoc numbers are an integer which always gets increased and have unique value once assigned. Namenode generates epoc number using a simple algorithm and uses it while sending RPC requests to the QJM. When you configure Namenode HA, the first Active Namenode will get epoc value 1. In case of failover or restart, epoc number will get increased. The Namenode with higher epoc number is considered as newer than any Namenode with an earlier epoc number.

Now let's proceed with the real case, note the hostname of the healthy namenode

You will need to proceed as follows assuming you are logged on as root here is How do I fix one corrupted JN's edits?

# su - hdfs

1) Put both NN in safemode ( NN HA)

$ hdfs dfsadmin -safemode enter

Sample output

Safe mode is ON in namenode1/xxx.xxx.xx.xx:8020
Safe mode is ON in namenode2/xxx.xxx.xx.xx:8020

2) Save Namespace

$ hdfs dfsadmin -saveNamespace

3) On the non-working name node change directory to /hadoop/hdfs/journal/hdfsha/current/* Get the epoch and note the number it should be lower than the in the working name node cross check

$ cat last-promised-epoch

4) On the non-working name node backup all the files in journal dir /hadoop/hdfs/journal/hdfsha/current/* they should look like below

-rw-r--r-- 1 hdfs hadoop 1019566 Jun 10 09:45 edits_0000000000000928232-0000000000000935461
-rw-r--r-- 1 hdfs hadoop 1014516 Jun 10 15:45 edits_0000000000000935462-0000000000000942657
-rw-r--r-- 1 hdfs hadoop 1017540 Jun 10 21:46 edits_0000000000000942658-0000000000000949874
-rw-r--r-- 1 hdfs hadoop 1048576 Jun 10 23:36 edits_0000000000000949875-0000000000000952088
-rw-r--r-- 1 hdfs hadoop 1048576 Jun 13 22:27 edits_inprogress_0000000000000952089
-rw-r--r-- 1 hdfs hadoop  277083 Jun 10 21:46 fsimage_0000000000000949874
-rw-r--r-- 1 hdfs hadoop      62 Jun 10 21:46 fsimage_0000000000000949874.md5
-rw-r--r-- 1 hdfs hadoop  276740 Jun 13 22:13 fsimage_0000000000000952088
-rw-r--r-- 1 hdfs hadoop      62 Jun 13 22:13 fsimage_0000000000000952088.md5
-rw-r--r-- 1 hdfs hadoop       7 Jun 13 22:13 seen_txid
-rw-r--r-- 1 hdfs hadoop     206 Jun 13 22:13 VERSION

5) While in the current directory backup all the files note the (.) indicating current dir

$ tar -zcvf editsbck.tar.gz .

6) Move the editsbck.tar.gz to a safe location

$ scp editsbck.tar.gz  /home/bronson

7) Backup or move any directory therein eg

$ mv paxos paxos.bck

😎 Delete all files in the /hadoop/hdfs/journal/hdfsha/current/ on the bad node remember you have a backup editsbck.tar.gz

$ rm -rf /hadoop/hdfs/journal/hdfsha/current/*

9) zip or tar the journal dir from a working JN node /hadoop/hdfs/journal/hdfsha/current/*

$ tar -zcvf good_editsbck.tar.gz

10) Copy and unzip/untar the good_editsbck.tar.gz to the non-working JN node to same path as the working namenode /hadoop/hdfs/journal/hdfsha/current/

# scp good_editsbck.tar.gz root@namenode2:/hadoop/hdfs/journal/hdfsha/current/

11) Unzip the files

# tar xvzf something.tar.gz  -C /hadoop/hdfs/journal/hdfsha/current/

12) Chown ownership to hdfs the -R recursive in case you have directories

# chown -R hdfs:hadoop /hadoop/hdfs/journal/hdfsha/current/*

Log on the unhealthy name node

13) Restarting the journal nodes

Start all 3 journal nodes note I run the command like root if the were running stop you will see

journal node running as process xxxx. Stop it first.

14) Stopping journal node

# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode"

15) Starting journal node

# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode"

Restart HDFS from Ambari UI

After some minutes the alerts should go and you should see a healthy Active & standby Namenodes. All should be fine now, the NameNode failover should now occur transparently and the below alerts should gradually disappear

HTH

View solution in original post

Shelton · ‎06-13-2019

@Michael Bronson

Yes its possible to recover from this situation, which happens sometimes in a Namenode HA setup. Journal nodes are distributed system to store edits. Active Namenode as a client writes edits to journal nodes and commit only when it's replicated to all the journal nodes in a distributed system. Standby NN needs to read data from edits to be in sync with Active one. It can read from any of the replica stored on journal nodes.

ZKFC will make sure that only one Namenode should be active at a time. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, we should configure fencing methods even when using the Quorum Journal Manager.

To work with a fencing journal manager uses epoc numbers. Epoc numbers are an integer which always gets increased and have unique value once assigned. Namenode generates epoc number using a simple algorithm and uses it while sending RPC requests to the QJM. When you configure Namenode HA, the first Active Namenode will get epoc value 1. In case of failover or restart, epoc number will get increased. The Namenode with higher epoc number is considered as newer than any Namenode with an earlier epoc number.

Now let's proceed with the real case, note the hostname of the healthy namenode

You will need to proceed as follows assuming you are logged on as root here is How do I fix one corrupted JN's edits?

# su - hdfs

1) Put both NN in safemode ( NN HA)

$ hdfs dfsadmin -safemode enter

Sample output

Safe mode is ON in namenode1/xxx.xxx.xx.xx:8020
Safe mode is ON in namenode2/xxx.xxx.xx.xx:8020

2) Save Namespace

$ hdfs dfsadmin -saveNamespace

3) On the non-working name node change directory to /hadoop/hdfs/journal/hdfsha/current/* Get the epoch and note the number it should be lower than the in the working name node cross check

$ cat last-promised-epoch

4) On the non-working name node backup all the files in journal dir /hadoop/hdfs/journal/hdfsha/current/* they should look like below

-rw-r--r-- 1 hdfs hadoop 1019566 Jun 10 09:45 edits_0000000000000928232-0000000000000935461
-rw-r--r-- 1 hdfs hadoop 1014516 Jun 10 15:45 edits_0000000000000935462-0000000000000942657
-rw-r--r-- 1 hdfs hadoop 1017540 Jun 10 21:46 edits_0000000000000942658-0000000000000949874
-rw-r--r-- 1 hdfs hadoop 1048576 Jun 10 23:36 edits_0000000000000949875-0000000000000952088
-rw-r--r-- 1 hdfs hadoop 1048576 Jun 13 22:27 edits_inprogress_0000000000000952089
-rw-r--r-- 1 hdfs hadoop  277083 Jun 10 21:46 fsimage_0000000000000949874
-rw-r--r-- 1 hdfs hadoop      62 Jun 10 21:46 fsimage_0000000000000949874.md5
-rw-r--r-- 1 hdfs hadoop  276740 Jun 13 22:13 fsimage_0000000000000952088
-rw-r--r-- 1 hdfs hadoop      62 Jun 13 22:13 fsimage_0000000000000952088.md5
-rw-r--r-- 1 hdfs hadoop       7 Jun 13 22:13 seen_txid
-rw-r--r-- 1 hdfs hadoop     206 Jun 13 22:13 VERSION

5) While in the current directory backup all the files note the (.) indicating current dir

$ tar -zcvf editsbck.tar.gz .

6) Move the editsbck.tar.gz to a safe location

$ scp editsbck.tar.gz  /home/bronson

7) Backup or move any directory therein eg

$ mv paxos paxos.bck

😎 Delete all files in the /hadoop/hdfs/journal/hdfsha/current/ on the bad node remember you have a backup editsbck.tar.gz

$ rm -rf /hadoop/hdfs/journal/hdfsha/current/*

9) zip or tar the journal dir from a working JN node /hadoop/hdfs/journal/hdfsha/current/*

$ tar -zcvf good_editsbck.tar.gz

10) Copy and unzip/untar the good_editsbck.tar.gz to the non-working JN node to same path as the working namenode /hadoop/hdfs/journal/hdfsha/current/

# scp good_editsbck.tar.gz root@namenode2:/hadoop/hdfs/journal/hdfsha/current/

11) Unzip the files

# tar xvzf something.tar.gz  -C /hadoop/hdfs/journal/hdfsha/current/

12) Chown ownership to hdfs the -R recursive in case you have directories

# chown -R hdfs:hadoop /hadoop/hdfs/journal/hdfsha/current/*

Log on the unhealthy name node

13) Restarting the journal nodes

Start all 3 journal nodes note I run the command like root if the were running stop you will see

journal node running as process xxxx. Stop it first.

14) Stopping journal node

# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode"

15) Starting journal node

# su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode"

Restart HDFS from Ambari UI

After some minutes the alerts should go and you should see a healthy Active & standby Namenodes. All should be fine now, the NameNode failover should now occur transparently and the below alerts should gradually disappear

HTH

mike_bronson7 · ‎06-13-2019

@Geoffrey Shelton Okot thank you so much for your effort and for your time , tomorrow I will do your steps , in spite I still not do the tests , I will choose your answer as accept , and if I have comments then I will share them , again many many thanks

Michael-Bronson

mike_bronson7 · ‎06-14-2019

@Geoffrey Shelton Okot - about cat last-promised-epoch , I have the number 31 in non working node , do you mean that I need to decrease it to 30?

Michael-Bronson

Shelton · ‎06-14-2019

@Michael Bronson

Can you confirm all the other 2 journal nodes have the last-promised-epoch of 30? That when the failure occurred, it's okay to replace the contents of the /hadoop/hdfs/journal/hdfsha/current/* with the contents of the good(active) namenode.

Then proceed with the subsequent steps

mike_bronson7 · ‎06-14-2019

on the good name node we have the number 31

on the bad name node we have also the number 31

on the other journal node we have the number 28

Michael-Bronson

mike_bronson7 · ‎06-14-2019

so according to this info do you recommended to set the value as is ( 31 ) or other ?

Michael-Bronson

Shelton · ‎06-14-2019

@Michael Bronson

Get the journal node that is healthy (active namenode) aftter saving the Namespace you also wipe out the other journal node which had edits_inprogress_0000000000018783114.empty remember to backup/zip all the journalnodes as good practice

Once you have copied the good to all the 3 destinations proceed and when you start the namenode after staring the journalnode one should become active and the other standby thanks to ZKFailover.

Shelton · ‎06-14-2019

@Michael Bronson

Is all good?

mike_bronson7 · ‎06-16-2019

@Geoffrey Shelton Okot no both namenode started as standby and then the namenode stooped

Michael-Bronson

Cloudera Community

Support Questions

how to recover the edits_inprogress_xxxxxx file ?