Support Questions

Find answers, ask questions, and share your expertise

ambari cluster + No valid image files found

avatar

we have a old ambari cluster machine version 2.6

from the logs ( under /var/log/hadoop/hdf ) , we can see the error - No valid image files found

I am not sure about my solution , but is its mean that we need to delete the files - ( edits_inprogress_XXXXX ) under /hadoop/hdfs/journal/hdfsha/current , and then restart the standby name-node service ?

2018-01-24 16:10:27,826 ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
        at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:618)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769)
2018-01-24 16:10:27,829 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-01-24 16:10:27,845 INFO  namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master03.sys573.com/102.14.22.29
************************************************************/
Michael-Bronson
5 REPLIES 5

avatar

I am also tried with hadoop namenode -format on master03 machine

but we got this:

<code>Could not format one or more JournalNodes. 1 exceptions thrown:
Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty


the complete log from ( hadoop namenode -format )


18/01/24 17:36:19 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
8485: Directory /data/hadoop/hdfs/journal/hdfsha is in an inconsistent state: Can't format the storage directory because the current directory is not empty.
        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.checkEmptyCurrent(Storage.java:482)
        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:558)
        at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:185)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:145)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Are you getting the following error while starting Both NameNodes? Or only while starting Active NameNode?

ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.java.io.FileNotFoundException: No valid image files found        
        at org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)

.

This can usually happen happen when the dfs.namenode.name.dir (default path: /hadoop/hdfs/namenode) directory is empty due to disk issue the files are not present there.

If this is the case and the Active NameNode is already running (this must be true) then you can try the followng:

Try running the following command:

# su - hdfs 
# hdfs namenode -bootstrapStandby 

NOTE: Please run this command ONLY on Standby NameNode. DO NOT run this command on Active NameNode. This command will try to recover all metadata on Standby NameNode.
.

- Now try to start Standby NameNode from Ambari
- Also please Restart ZKFailoverController from Ambari

.

avatar

@Jay I get on both nodes

Michael-Bronson

avatar

@jay I run the hdfs namenode -bootstrapStandby on stand by but I get

Retrying connect to server: master01.sys57.com/100.4.3.21:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)

and that because both name node are down - I can start the name node on both machines

Michael-Bronson

avatar

@Jay as we agree yesterday the "no image found" in the log indicate that fsimage files are missing can you please approv ethis?

Michael-Bronson