Created 12-16-2015 10:25 AM
Dear Team,
Following error has been occur when Namenode service start.
ERROR namenode.NameNode (NameNode.java:main(1657)) - Failed to start namenode. java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 293929 but unable to find any edit logs containing txid 221561
Regards,
Nilesh
Created 12-16-2015 07:06 PM
In HDFS, the NameNode metadata consists of fsimage files (checkpoints of the entire file system state) and edit logs (a sequence of transactions to be applied that alter the base file system state represented in the most recent checkpoint). There are various consistency checks performed by the NameNode when it reads these metadata files. The error message indicates that one of these consistency checks has failed.
Specifically, the NameNode separately tracks the last known transaction ID that was previously present in edit logs in another file named seen_txid. If the transaction ID recorded in this file is not available in the edit logs when the NameNode is trying to load metadata at startup, then it aborts.
It's difficult to say exactly how this could have happened in your environment without a deep review of configuration, logs and operations procedures. A potential explanation would be if the NameNode metadata was restored from a backup, and that backup contained the most recent fsimage (the checkpoint) but did not include the edit logs (the subsequent transactions).
You might be interested in these additional resources that give further explanation of the NameNode metadata and suggestions on a possible backup plan.
Created 12-16-2015 12:02 PM
I believe there was recent crash or reboot of servers or some operation that caused the lag.
Recent Txid is 293929 , NN is looking for 221561
You have to provide the edit logs. If it's dummy or lab cluster then you may be able to restart the nn by formatting it **It can cause data loss**
Created 12-16-2015 01:17 PM
Yes it is a test server. But what could be a solution in case same error found in production.
Created 12-16-2015 01:24 PM
@Nilesh Solution will be to treat production as production 🙂 and have backups of name nodes directories.
Created 12-16-2015 07:06 PM
In HDFS, the NameNode metadata consists of fsimage files (checkpoints of the entire file system state) and edit logs (a sequence of transactions to be applied that alter the base file system state represented in the most recent checkpoint). There are various consistency checks performed by the NameNode when it reads these metadata files. The error message indicates that one of these consistency checks has failed.
Specifically, the NameNode separately tracks the last known transaction ID that was previously present in edit logs in another file named seen_txid. If the transaction ID recorded in this file is not available in the edit logs when the NameNode is trying to load metadata at startup, then it aborts.
It's difficult to say exactly how this could have happened in your environment without a deep review of configuration, logs and operations procedures. A potential explanation would be if the NameNode metadata was restored from a backup, and that backup contained the most recent fsimage (the checkpoint) but did not include the edit logs (the subsequent transactions).
You might be interested in these additional resources that give further explanation of the NameNode metadata and suggestions on a possible backup plan.