Created on 01-24-2018 08:06 PM - edited 08-17-2019 09:46 PM
in our ambari cluster , we have a very strange problem
we restart all servers - master01-03
and on each master server we start the services from beginning according to the right order
first on all masters we start the zookeeper server
then on all masters we start the JournalNode
but we notice that on the last master machine - JournalNode restarting evry 10-20 seconds
and on all other machines - JournalNode is stable
please advice why this happend ?
Created 01-24-2018 09:58 PM
As you are getting the error:
ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found
.
So can you please check of the following directory has any fsimage in it or not? Also if the fsimage file has proper read permission as following or not?
Example:
# ls -l /hadoop/hdfs/namenode/current/fsimage* -rw-r--r--. 1 hdfs hadoop 195873 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213 -rw-r--r--. 1 hdfs hadoop 62 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213.md5 -rw-r--r--. 1 hdfs hadoop 195873 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519 -rw-r--r--. 1 hdfs hadoop 62 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519.md5
.
Created 01-24-2018 10:21 PM
You can grep for 'fsimage" or the "current" word so that if there is any entry for file deletion then it might be logged in there.
.
Also as it looks like your "fsimage" file is missing so i do not have any option right now to fix this issue until we have fsimage backup stored somewhere which we can restore. (Else we are in Data Loss Situation)
.
Is there any Mount point Or Filesystem issue which is causing the particular fsimage partition to go away (disappear) may be you can check with some storage guy to findout what happened to that file?
Created 01-24-2018 10:22 PM
@Jay is it possible to find other file fsimage from other cluster and use it on the problematic cluser ?
Created 01-24-2018 10:23 PM
Created 01-24-2018 10:33 PM
so I am really not understand who can delete them , only root can delete them of maybe from hdfs user ? any chioce to recover them ?
Created 01-24-2018 10:37 PM
from root , I do this
grep fsimage /var/log/audit/audit.log but no results!