Member since
10-06-2019
2
Posts
0
Kudos Received
0
Solutions
03-08-2022
12:40 AM
Hi, I have a HA Cloudera setup. Primary namenode is up and standby namenode keeps going down in a few seconds after restarting. I was facing job failure issue in production and the below error was displaying in job error logs The directory item limit of /user/spark/applicationHistory is exceeded: limit=1048576 items=1048576. so I had moved some old files which was 5 years old from /user/spark/applicationHistory to other location and did a rolling restart of hdfs service from cloudera manager and job started running. but few days later the standby namenode failure issue started. Please let me know how to resolve the issue. I have tried the below steps but still facing the same issue: 1. Put Active NN in safemode 2. Do a save namespace operation on Active NN 3. Leave Safemode 4. Login to Standby NN 5. hdfs namenode -bootstrapStandby -force 6. Start the failed standby Namenode. Logs from failed namenode server Datanodes logs out file failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused Namenode out file log: FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN. java.io.IOException: java.lang.IllegalStateException: Cannot skip to less than the current value (=346057041), where newValue=346057040 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.resetLastInodeId(FSNamesystem.java:657) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:280) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:140) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:848) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:442) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361) Caused by: java.lang.IllegalStateException: Cannot skip to less than the current value (=346057041), where newValue=346057040 at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.resetLastInodeId(FSNamesystem.java:655) .13 more 2022-03-08 02:11:50,893 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2022-03-08 02:11:50,895 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ Journal node out log: 2022-03-07 16:42:32,655 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_inprogress_0000000015143419542 -> /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_0000000015143419542-0000000015145668341 2022-03-07 17:11:46,618 INFO org.apache.hadoop.hdfs.server.common.Storage: Purging no-longer needed file 15140407066 2022-03-07 17:11:46,630 INFO org.apache.hadoop.hdfs.server.common.Storage: Purging no-longer needed file 15139990404 2022-03-07 17:12:37,716 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_inprogress_0000000015145668342 -> /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_0000000015145668342-0000000015145759436 2022-03-07 19:43:48,992 WARN org.apache.hadoop.hdfs.qjournal.server.Journal: Sync of transaction range 15146089648-15146089648 took 1311ms 2022-03-07 22:40:51,859 WARN org.apache.hadoop.hdfs.qjournal.server.Journal: Sync of transaction range 15146467897-15146467897 took 1119ms 2022-03-08 02:11:48,661 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_inprogress_0000000015145759437 -> /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_0000000015145759437-0000000015146939052 2022-03-08 02:39:00,995 WARN org.apache.hadoop.hdfs.qjournal.server.Journal: Sync of transaction range 15148810390-15148810519 took 1044ms 2022-03-08 02:42:32,734 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_inprogress_0000000015146939053 -> /opt/hadoop/dfs/jn/bbda1-prod-cdh-01-ns/current/edits_0000000015146939053-0000000015149060700 Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager