Member since
01-15-2019
11
Posts
0
Kudos Received
0
Solutions
05-28-2019
02:30 AM
@Geoffrey Shelton Okot Hi, node02 node03 node04 edits_* 10414 10414 10412 epoch 684 684 684 NN failover tested OK. ZKFC status OK.
... View more
05-26-2019
09:25 AM
@Geoffrey Shelton Okot: Hi, Checkpointing issue remains. Last checkpoint was done 18 hours back. No fsimage after that. Also, the good node i.e. node04 has two edits_inprogress now. I don't know what to comprehend of this.
... View more
05-25-2019
03:16 PM
Also, found this edits_inprogress.empty file in namenode/current directory on node02. 😞 @Geoffrey Shelton Okot
... View more
05-25-2019
02:46 PM
@Geoffrey Shelton Okot: Completed the steps on node02 & node03 from the healthy set of edits from node04. (there were 2 edits_inprogress on node02 & one edits_inprogress.empty on node03) There were some permissions issues, got through. However, while restarting HDFS components through ambari the Namenode on node03 was stuck & not able to start. After a pretty long time the operation completed but the standby NN on node03 was in stopped state. The errors are as attached. (unable to upload file hence the screenshot) Both NNs are up and running for now. Observing the cluster for another 24 hours to see if the issue has resolved for the better. Will keep you posted about the checkpointing status. Thanks a lot for your invaluable help. Thanks and Regards, Farhana.
... View more
05-22-2019
08:12 PM
@Geoffrey Shelton Okot: Currently node03 is active NN. Last promised epoch on all 3 nodes : 591 & Number of file in /hadoop/hdfs/journal/XXXXcluster/current are as follows: node02 : 10281 node03 : 10284 node04 : 10284 Yes, still experiencing the same problem. Two edits_inprogress files on node02 ..../journal/../current directory. Standby NN goes into stopped state frequently. Checkpointing does not occur every 14400 seconds as configured in checkpoint.period (Last night, standby NN i.e. node02 was in stopped state, had to manually start it from Ambari. Checked the Last checkpoint : Tue May 21 2019 17:14:59, as on ntp time : Wed May 22 2019 1:03:36 ) As for today, Last checkpoint time : Wed May 22 2019 10:18:33 , whereas ntp time : Wed May 22 2019 19:10:52 )
... View more
05-21-2019
07:04 PM
@Geoffrey Shelton Okot : Hi, I have a doubt pertaining the number of files present in the current directory on each of the JNs - (I have checked it for a couple of times within a 5-minute period, number of files on both node03 & node04 remain the same while only node02 differs.) node02 : 11040 node03 : 11043 node04 : 11043 Does it infer that edits on only node02 are corrupted while 03 & 04 are ok ? Kindly confirm if the above conjecture is correct or otherwise and whether I should go ahead with changes on both node02 & node04 or just node02. (Last promised epoch on all 3nodes - 577.) Thanks.
... View more
05-21-2019
05:20 PM
Hi, just saw your comment. Will apply the mentioned changes and revert with updates. Thank you.
... View more
05-20-2019
08:53 AM
@Geoffrey Shelton Okot : Number of files present in the current directory (/hadoop/hdfs/journal/XXXXcluster/current) of each journal node - node02 : 11078 node03 : 11082 node04 : 11081 Last promised epoch - node02: 542 node03: 542 node04: 542 As asked, attaching screenshot of the path to edits_000000 files. (on node02)
... View more
05-18-2019
05:08 AM
@Geoffrey Shelton Okot : Thanks for responding. Yes, Namenode is HA enabled.
... View more
05-17-2019
04:04 PM
I have been facing frequent NN failover & checkpointing issues on my 6 node cluster on VMs, mostly the standby would remain in stopped state post a failover until started manually. Have tried increasing the QJM timeouts that helped with failover. However, the checkpointing issue remains. Have 3 journal nodes on node02, node03, node4 for example. Have found multiple edits_inprogress files on one of the journal node (node02), the other two had one each edits_inprogress.empty file (viz. on node03 &node04). After taking a backup, I deleted the extra edits_inprogress (leaving the most recent one) & the edits_inprogress.empty on the other two. Restarted all the JNs one by one. However, after checking the current directory on my journal node (node02) later - unfortunately it has two edits_inprogress files yet again. I can't seem to understand this behavior of this journal node. How & what is causing it to generate more than one edits_inprogress on this particular node (node02) ? Kindly help. edits_inprogress.png @Jay Kumar SenSharma @Geoffrey Shelton Okot
... View more
Labels:
- Labels:
-
Apache Hadoop