- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
ambari cluster + No valid image files found
- Labels:
-
Apache Ambari
-
Apache Hadoop
Created ‎01-24-2018 04:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we have a old ambari cluster machine version 2.6
from the logs ( under /var/log/hadoop/hdf ) , we can see the error - No valid image files found
I am not sure about my solution , but is its mean that we need to delete the files - ( edits_inprogress_XXXXX ) under /hadoop/hdfs/journal/hdfsha/current , and then restart the standby name-node service ?
2018-01-24 16:10:27,826 ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:618) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769) 2018-01-24 16:10:27,829 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 2018-01-24 16:10:27,845 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master03.sys573.com/102.14.22.29 ************************************************************/
Created ‎01-24-2018 05:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am also tried with hadoop namenode -format on master03 machine
but we got this:
<code>Could not format one or more JournalNodes. 1 exceptions thrown: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty the complete log from ( hadoop namenode -format ) 18/01/24 17:36:19 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown: 8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty. at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.checkEmptyCurrent(Storage.java:482) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:558) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:185) at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:145) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
Created ‎01-24-2018 09:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you getting the following error while starting Both NameNodes? Or only while starting Active NameNode?
ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.java.io.FileNotFoundException: No valid image files found at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
.
This can usually happen happen when the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty due to disk issue the files are not present there.
If this is the case and the Active NameNode is already running (this must be true) then you can try the followng:
Try running the following command:
# su - hdfs # hdfs namenode -bootstrapStandby
NOTE: Please run this command ONLY on Standby NameNode. DO NOT run this command on Active NameNode. This command will try to recover all metadata on Standby NameNode.
.
- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari
.
Created ‎01-24-2018 09:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jay I get on both nodes
Created ‎01-24-2018 09:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jay I run the hdfs namenode -bootstrapStandby on stand by but I get
Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
and that because both name node are down - I can start the name node on both machines
Created ‎01-25-2018 04:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jay as we agree yesterday the "no image found" in the log indicate that fsimage files are missing can you please approv ethis?
