Support Questions
Find answers, ask questions, and share your expertise

Standby NameNode cant start in ambari cluster

we are trying to start the "Standby NameNode (HDFS)" on master01 machine in ambari cluster version 2.6

and we cant start it

we get the following logs:

ERROR namenode.NameNode (NameNode.java:main(1774)) - Failed to start namenode. org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 13361263 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:203) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:693) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:289) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:703) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:688) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:992) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:976) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1701) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769) Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 13361262; expected file to go up to 13361312

what chould be the problem m and how to fix , so the service will start?

Michael-Bronson
1 ACCEPTED SOLUTION

Super Mentor

@uri ben-ari

1. Was this Node working fine earlier?

2. Do you have the correct "etc/hosts" entry in upper case or mixed case.

3. Is this a kerberized cluster?


View solution in original post

14 REPLIES 14

Super Mentor

@uri ben-ari

1. Was this Node working fine earlier?

2. Do you have the correct "etc/hosts" entry in upper case or mixed case.

3. Is this a kerberized cluster?


Was this Node working fine earlier - yes

Michael-Bronson

Do you have the correct "etc/hosts" - yes

Michael-Bronson

Is this a kerberized cluster? - what you mean we have 3 masters machine + 2 workers machines

Michael-Bronson

Super Mentor

@uri ben-ari

If it is test cluster , then you may try the following (At your own risk)

1. If your Active NameNode is running fine then you can try to bring safe mode out with forceExit on Active NN.

2. Then take a backup of the directory "/hadoop/hdfs/namenode/current"

3. After taking the backup mentioned in previous step, please remove the directory contents "/hadoop/hdfs/namenode/current/*"

4. Perform the bootstrapStandby.

.

can you please show me how to bring safe mode out with forceExit on Active NN , bootstrapStandby.

Michael-Bronson

Super Mentor

@uri ben-ari

Something like this you can try:

# su - hdfs 
# hdfs dfsadmin -safemode leave

.

my active node is on master02 so I need to do the steps on master02 machine? ( include backup )

Michael-Bronson

hi Jay still waiting to your answer , do you mean to do hdfs dfsadmin -safemode leave on the namenode that is runing ( master02 ) ? ( standby namenode is on master01 ) , second - Then take a backup of the directory "/hadoop/hdfs/namenode/current" should be on the active name node ? , I ask this because it is more logic to do this on standby ( master01 machine )

Michael-Bronson

we found this workaround - hadoop namenode -recover , is it other soultion for our problem ?

Michael-Bronson

hi jay just to clarify your instructions ( and regarding that standby name node on master01 and active node on master02 can you approve this steps

su - hdfs

hdfs dfsadmin -safemode leave ( on master02 )

cp -rp /hadoop/hdfs/journal/hdfsha/current /hadoop/hdfs/journal/hdfsha/current.orig ( on master02 )

rm -f /hadoop/hdfs/journal/hdfsha/current/* ( on master02 )

hdfs namenode -bootstrapStandby ( on master01 )

Michael-Bronson

Contributor

Hi @mike_bronson7 ,

were you able to solve the issue with those steps?

 

Thank you

New Contributor

I had created a new HA enabled cluster on EC2 instances. It came up without any issues. I then installed kerberos on two machines as master and slave, and then started the kerberos enable ambari wizard. At the last step of starting the services, Namenode start failed on both the nodes and gave same error as above. How this error came on a new setup and how to prevent it from coming up in my next environment setup?

New Contributor

I had created a new HA enabled cluster on EC2 instances. It came up without any issues. I then installed kerberos on two machines as master and slave, and then started the kerberos enable ambari wizard. At the last step of starting the services, Namenode start failed on both the nodes and gave same error as above. How this error came on a new setup and how to prevent it from coming up in my next environment setup?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.