Support Questions

Find answers, ask questions, and share your expertise

JournalNode ( HDFS ) restarting all the time

avatar

in our ambari cluster , we have a very strange problem

we restart all servers - master01-03

and on each master server we start the services from beginning according to the right order

first on all masters we start the zookeeper server

then on all masters we start the JournalNode

but we notice that on the last master machine - JournalNode restarting evry 10-20 seconds


and on all other machines - JournalNode is stable

please advice why this happend ?

58382-capture.png

58381-capture.png

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

As you are getting the error:

ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found

.

So can you please check of the following directory has any fsimage in it or not? Also if the fsimage file has proper read permission as following or not?

Example:

# ls -l /hadoop/hdfs/namenode/current/fsimage*

-rw-r--r--. 1 hdfs hadoop 195873 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213
-rw-r--r--. 1 hdfs hadoop     62 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213.md5
-rw-r--r--. 1 hdfs hadoop 195873 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519
-rw-r--r--. 1 hdfs hadoop     62 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519.md5

.

View solution in original post

24 REPLIES 24

avatar

now the journalnode stop for restarting when I start the matrix , bur when now I start the name-node we get

- ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
Michael-Bronson

avatar

in the GUI - I search the

Custom hadoop-metrics2.properties , and I see only Add Property ... thats all !

Michael-Bronson

avatar
Contributor

Can you redeploy the HA and see if there were any steps that you missed during the HA enabling process. Please follow the steps suggested by Hortonworks.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_hadoop-high-availability/content/ch_HA-N...

avatar

@Jay but as you know the main problem now is that we cant start the namenode on both machines ,

Michael-Bronson

avatar

@JAY what we see from the log when we start the name node on master01/03 is that

ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

As you are getting the error:

ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode.
java.io.FileNotFoundException: No valid image files found

.

So can you please check of the following directory has any fsimage in it or not? Also if the fsimage file has proper read permission as following or not?

Example:

# ls -l /hadoop/hdfs/namenode/current/fsimage*

-rw-r--r--. 1 hdfs hadoop 195873 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213
-rw-r--r--. 1 hdfs hadoop     62 Jan 22 20:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002711213.md5
-rw-r--r--. 1 hdfs hadoop 195873 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519
-rw-r--r--. 1 hdfs hadoop     62 Jan 23 02:05 /hadoop/hdfs/namenode/current/fsimage_0000000000002718519.md5

.

avatar
 I get that:



ls -l /hadoop/hdfs/journal/hdfsha/current/fsimage*
ls: cannot access /hadoop/hdfs/journal/hdfsha/current/fsimage*: No such file or directory
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

It looks like some one deleted the "fsimage" file from the NameNode host. I am not aware of any hadoop bug which will cause deletion of this file.

So please check the Operating System Audit log to find out who has deleted the file and when?


# less /var/log/audit/audit.log

.

avatar

Ho NO , any option for backup files or how to restore this ?

Michael-Bronson

avatar

this huge file what need to search with grep ?

Michael-Bronson