Created on 01-24-2018 08:06 PM - edited 08-17-2019 09:46 PM
in our ambari cluster , we have a very strange problem
we restart all servers - master01-03
and on each master server we start the services from beginning according to the right order
first on all masters we start the zookeeper server
then on all masters we start the JournalNode
but we notice that on the last master machine - JournalNode restarting evry 10-20 seconds
and on all other machines - JournalNode is stable
please advice why this happend ?
Created 01-24-2018 09:58 PM
As you are getting the error:
ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found
.
So can you please check of the following directory has any fsimage in it or not? Also if the fsimage file has proper read permission as following or not?
Example:
# ls -l /hadoop/hdfs/namenode/current/fsimage* -rw-r--r--. 1 hdfs hadoop 195873 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213 -rw-r--r--. 1 hdfs hadoop 62 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213.md5 -rw-r--r--. 1 hdfs hadoop 195873 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519 -rw-r--r--. 1 hdfs hadoop 62 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519.md5
.
Created 01-24-2018 09:46 PM
now the journalnode stop for restarting when I start the matrix , bur when now I start the name-node we get
- ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found
Created 01-24-2018 09:50 PM
in the GUI - I search the
Custom hadoop-metrics2.properties , and I see only Add Property ... thats all !
Created 01-24-2018 10:02 PM
Can you redeploy the HA and see if there were any steps that you missed during the HA enabling process. Please follow the steps suggested by Hortonworks.
Created 01-24-2018 09:51 PM
@Jay but as you know the main problem now is that we cant start the namenode on both machines ,
Created 01-24-2018 09:52 PM
@JAY what we see from the log when we start the name node on master01/03 is that
ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found
Created 01-24-2018 09:58 PM
As you are getting the error:
ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. java.io.FileNotFoundException: No valid image files found
.
So can you please check of the following directory has any fsimage in it or not? Also if the fsimage file has proper read permission as following or not?
Example:
# ls -l /hadoop/hdfs/namenode/current/fsimage* -rw-r--r--. 1 hdfs hadoop 195873 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213 -rw-r--r--. 1 hdfs hadoop 62 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213.md5 -rw-r--r--. 1 hdfs hadoop 195873 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519 -rw-r--r--. 1 hdfs hadoop 62 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519.md5
.
Created 01-24-2018 10:01 PM
I get that: ls -l /hadoop/hdfs/journal/hdfsha/current/fsimage*
ls: cannot access /hadoop/hdfs/journal/hdfsha/current/fsimage*: No such file or directory
Created 01-24-2018 10:03 PM
It looks like some one deleted the "fsimage" file from the NameNode host. I am not aware of any hadoop bug which will cause deletion of this file.
So please check the Operating System Audit log to find out who has deleted the file and when?
# less /var/log/audit/audit.log
Created 01-24-2018 10:08 PM
Ho NO , any option for backup files or how to restore this ?
Created 01-24-2018 10:11 PM
this huge file what need to search with grep ?