Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Namenode Txid Error

avatar
Expert Contributor

Dear Team,

Following error has been occur when Namenode service start.

ERROR namenode.NameNode (NameNode.java:main(1657)) - Failed to start namenode. java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 293929 but unable to find any edit logs containing txid 221561

Regards,

Nilesh

1 ACCEPTED SOLUTION

avatar

In HDFS, the NameNode metadata consists of fsimage files (checkpoints of the entire file system state) and edit logs (a sequence of transactions to be applied that alter the base file system state represented in the most recent checkpoint). There are various consistency checks performed by the NameNode when it reads these metadata files. The error message indicates that one of these consistency checks has failed.

Specifically, the NameNode separately tracks the last known transaction ID that was previously present in edit logs in another file named seen_txid. If the transaction ID recorded in this file is not available in the edit logs when the NameNode is trying to load metadata at startup, then it aborts.

It's difficult to say exactly how this could have happened in your environment without a deep review of configuration, logs and operations procedures. A potential explanation would be if the NameNode metadata was restored from a backup, and that backup contained the most recent fsimage (the checkpoint) but did not include the edit logs (the subsequent transactions).

You might be interested in these additional resources that give further explanation of the NameNode metadata and suggestions on a possible backup plan.

http://hortonworks.com/blog/hdfs-metadata-director...

https://community.hortonworks.com/questions/4694/p...

View solution in original post

4 REPLIES 4

avatar
Master Mentor
@Nilesh

I believe there was recent crash or reboot of servers or some operation that caused the lag.

Recent Txid is 293929 , NN is looking for 221561

You have to provide the edit logs. If it's dummy or lab cluster then you may be able to restart the nn by formatting it **It can cause data loss**

avatar
Expert Contributor

Yes it is a test server. But what could be a solution in case same error found in production.

avatar
Master Mentor

@Nilesh Solution will be to treat production as production 🙂 and have backups of name nodes directories.

avatar

In HDFS, the NameNode metadata consists of fsimage files (checkpoints of the entire file system state) and edit logs (a sequence of transactions to be applied that alter the base file system state represented in the most recent checkpoint). There are various consistency checks performed by the NameNode when it reads these metadata files. The error message indicates that one of these consistency checks has failed.

Specifically, the NameNode separately tracks the last known transaction ID that was previously present in edit logs in another file named seen_txid. If the transaction ID recorded in this file is not available in the edit logs when the NameNode is trying to load metadata at startup, then it aborts.

It's difficult to say exactly how this could have happened in your environment without a deep review of configuration, logs and operations procedures. A potential explanation would be if the NameNode metadata was restored from a backup, and that backup contained the most recent fsimage (the checkpoint) but did not include the edit logs (the subsequent transactions).

You might be interested in these additional resources that give further explanation of the NameNode metadata and suggestions on a possible backup plan.

http://hortonworks.com/blog/hdfs-metadata-director...

https://community.hortonworks.com/questions/4694/p...