Created 01-24-2018 04:31 PM
we have a old ambari cluster machine version 2.6
from the logs ( under /var/log/hadoop/hdf ) , we can see the error - No valid image files found
I am not sure about my solution , but is its mean that we need to delete the files - ( edits_inprogress_XXXXX ) under /hadoop/hdfs/journal/hdfsha/current , and then restart the standby name-node service ?
2018-01-24 16:10:27,826 ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:618) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769) 2018-01-24 16:10:27,829 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 2018-01-24 16:10:27,845 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master03.sys573.com/102.14.22.29 ************************************************************/
Created 01-24-2018 05:48 PM
I am also tried with hadoop namenode -format on master03 machine
but we got this:
<code>Could not format one or more JournalNodes. 1 exceptions thrown: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty the complete log from ( hadoop namenode -format ) 18/01/24 17:36:19 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown: 8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty. at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.checkEmptyCurrent(Storage.java:482) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:558) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:185) at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:145) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
Created 01-24-2018 09:06 PM
Are you getting the following error while starting Both NameNodes? Or only while starting Active NameNode?
ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
.
This can usually happen happen when the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty due to disk issue the files are not present there.
If this is the case and the Active NameNode is already running (this must be true) then you can try the followng:
Try running the following command:
# su - hdfs # hdfs namenode -bootstrapStandby
NOTE: Please run this command ONLY on Standby NameNode. DO NOT run this command on Active NameNode. This command will try to recover all metadata on Standby NameNode.
.
- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari
.
Created 01-24-2018 09:13 PM
@Jay I get on both nodes
Created 01-24-2018 09:17 PM
@jay I run the hdfs namenode -bootstrapStandby on stand by but I get
Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
and that because both name node are down - I can start the name node on both machines
Created 01-25-2018 04:45 PM
@Jay as we agree yesterday the "no image found" in the log indicate that fsimage files are missing can you please approv ethis?