- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
hadoop cluster + Unable to start standby Namenode
- Labels:
-
HDFS
-
Hortonworks Data Platform (HDP)
Created on ‎02-03-2024 02:20 PM - edited ‎02-03-2024 02:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we have HDP Hadoop cluster with two name-node services ( one active name-node and the secondary is the standby name-node )
due of unexpected electricity failure , the standby name-node failed to start with the flowing exception , while the active name-node starting successfully
2024-02-02 08:47:11,497 INFO common.Storage (Storage.java:tryLock(776)) - Lock on /hadoop/hdfs/namenode/in_use.lock acquired by nodename 36146@master1.delax.com
2024-02-02 08:47:11,891 INFO namenode.FSImage (FSImage.java:loadFSImageFile(745)) - Planning to load image: FSImageFile(file=/hadoop/hdfs/namenode/current/fsimage_0000000052670667141, cpktTxId=0000000052670667141)
2024-02-02 08:47:11,897 ERROR namenode.FSImage (FSImage.java:loadFSImage(693)) - Failed to load image from FSImageFile(file=/hadoop/hdfs/namenode/current/fsimage_0000000052670667141, cpktTxId=0000000052670667141)
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:204)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:221)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:898)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:882)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:755)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:686)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1077)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:724)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:697)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1001)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:985)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1710)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1778)
2024-02-02 08:47:12,238 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(726)) - Encountered exception loading fsimage
java.io.IOException: Failed to load FSImage file, see error(s) above for more info.
we can see from above exception - `Failed to load image from FSImageFile` , and seems it is as results of when machine failed because unexpected shutdown
as I understand one of the options to recover the standby name-node could be with the following procedure:
1. Put Active NN in safemode
sudo -u hdfs hdfs dfsadmin -safemode enter
2. Do a savenamespace operation on Active NN
sudo -u hdfs hdfs dfsadmin -saveNamespace
3. Leave Safemode
sudo -u hdfs hdfs dfsadmin -safemode leave
4. Login to Standby NN
5. Run below command on Standby namenode to get latest fsimage that we saved in above steps.
sudo -u hdfs hdfs namenode -bootstrapStandby -force
we glad to receive any suggestions , or if my above suggestion is good enough for our problem
Created ‎02-05-2024 05:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
=> If above steps still gives you issues then you can simply execute step 5 or below Cmd from Standby NN
Note: Step 1 to step 3 is process of creating new fsimage but if your Active NN is already up and running then I would directly login in to Standby and then perform bootstrapStandby operation
Created ‎02-04-2024 10:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Approach you mentioned involves further downtime
If your active NN is up and running then you can simply copy the latest fsimage from active NN data dir path to Standby NN data dir path and then try to start the standby NN once again
Created ‎02-04-2024 10:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
lets say I copy the fsimage from active to standby namenode and then still we have a problem to start the namenode then can I do the steps as already mentioned?
Created ‎02-05-2024 05:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
=> If above steps still gives you issues then you can simply execute step 5 or below Cmd from Standby NN
Note: Step 1 to step 3 is process of creating new fsimage but if your Active NN is already up and running then I would directly login in to Standby and then perform bootstrapStandby operation
