Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to start namenode with filenotfoundexception?

Highlighted

How to start namenode with filenotfoundexception?

New Contributor

I am running into an issue with starting my namenode (HDP 3.1) which failes on a filenotfoundexception. It complains about a file in /apps/hbase/data/WALs/. I ran hdfs fsck / and the report shows that the filesystem is healthy. I am not sure why this file doesn't exist, or why the namenode cares about it existing. Is there a way to force the namenode to start with the file missing?

9 REPLIES 9

Re: How to start namenode with filenotfoundexception?

Mentor

@scott powers

The directory its looking for holds the WAL's (hbase write ahead logs) Can you try tricking by running the below as user hdfs

$ hdfs dfs -mkdir -p /apps/hbase/data/WALs 
$ hdfs dfs -chown -R hbase:hdfs /apps/hbase/data/WALs

First restart hbase then try starting the namenode.

Please revert

Re: How to start namenode with filenotfoundexception?

New Contributor

@Geoffrey Shelton Okot

Sorry that it was unclear, there is a specific file in that directory it complains about. So in that case do you suggest

`hdfs dfs -touch thefilename`

then changing the owner?

Re: How to start namenode with filenotfoundexception?

New Contributor

@Geoffrey Shelton Okot

Sorry that it was unclear, there is a specific file in that directory it complains about. So in that case do you suggest

hdfs dfs -touch thefilename

then changing the owner?

Re: How to start namenode with filenotfoundexception?

Mentor

@scott powers

Okay I now understand yes

hdfs dfs -touch thefilename

Do that for any files it's complaining about and ownership and try restarting the namenode.

Re: How to start namenode with filenotfoundexception?

New Contributor

Interestingly restarting hbase deletes the directory and file that I created. If I create it and only restart the namenode it still says the file is missing, even though I see it with hdfs dfs -ls. Maybe some issue with a 0 length file?

Re: How to start namenode with filenotfoundexception?

Mentor

@scott powers

What type of cluster is this DEV, TEST etc? Kerberized or not? The contents of that directory look like this

How many hbase master in your cluster?

/apps/hbase/data/WALs/{host_name},16020,1549312261942/{host_name}%2C16020%2C1549312261942..meta.15495826629.meta
/apps/hbase/data/WALs/{host_name},16020,1549312261942/{host_name}%2C16020%2C1549312261942.default.15495857667

Can you paste the master log before the error happened? Of particular interest is the MasterProcWALs

 /var/log/hbase/hbase-hbase-master-xxx.log

Do you have hbase dependant services like atlas?

Test 1

Shutdown and Restart the cluster.

Test 2

Can you rename the directories MasterProcWALs & WALs found in /usr/var/lib/ambari-metrics-collector/hbase

# mv MasterProcWALs XXXMasterProcWALs
# mv WALs XXXWALs

Now restart the hbase and thereafter the Namenode

Please revert

Re: How to start namenode with filenotfoundexception?

Mentor

@scott powers

Any updates?

Re: How to start namenode with filenotfoundexception?

New Contributor

This is a production cluster that is kerberized. There have been a bunch of ongoing problems over the last week, of which this is one. I have 2 hbase masters in the cluster at the moment. I don't have any hbase dependent services, just an internally developed service that we can easily recreate the data for.

Given that I have been having issues all week I am hesitant to restart the cluster.

The only error I found in the base log is

2019-02-08 14:53:35,713 WARN  [Thread-18] wal.WALProcedureStore: Unable to read tracker for hdfs://cluster/apps/hbase/data/MasterProcWALs/pv2-00000000000000000336.logorg.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat$InvalidWALDataException: Missing trailer: size=19 startPos=19  at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readTrailer(ProcedureWALFormat.java:183)  at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile.readTrailer(ProcedureWALFile.java:93)  at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile.readTracker(ProcedureWALFile.java:100)  at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.initOldLog(WALProcedureStore.java:1386)  at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.initOldLogs(WALProcedureStore.java:1335)  at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:416)  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:714)  at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1398)  at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:857)  at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2225)  at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:568)  at java.lang.Thread.run(Thread.java:745)

Re: How to start namenode with filenotfoundexception?

New Contributor

It seems that the issue was not about hbase, but instead related to the namenode. Performing a bootstrapStanby after a backup resolved the issue. The biggest concern is that this was done without putting the cluster into safe mode, since the second namenode could not start up.