I've recently set up a HA cluster in my test environment.
I was trying to take a checkpoint on my HDFS filesystem using the "hdfs namenode -checkpoint" command.
It did not work as it said the directory: "/tmp/hadoop-hdfs/dfs/name" is "in an inconsistent state", I checked and the folder did not exist, after I proceeded to create it I received the following error:
java.lang.IllegalStateException: Unexpected state: OPEN_FOR_READING at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:245) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1235) at org.apache.hadoop.hdfs.server.namenode.BackupNode$BNHAContext.startActiveServices(BackupNode.java:471) at org.apache.hadoop.hdfs.server.namenode.BackupState.enterState(BackupState.java:51) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:847) at org.apache.hadoop.hdfs.server.namenode.BackupNode.<init>(BackupNode.java:89) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1523) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1610)
I did not find any documentation as to what the solution to the problem might be.
I have an assumption that manual checkpointing might be disabled in a HA environment.
Any help regarding this topic and HDFS metadata backups (specifically in a HA cluster) would be very helpful
Is your cluster and hdfs working fine? Are able to ingest data and run jobs?
Check the fsimage and edit log location in NN configuration.
I would recommend you to take a copy of your name node before any manual activity on it.
You may try this and use active/inactive nodes instead of primary/secondary name nodes mentioned in the article
As the official document:
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.
You should not deploy a checkpoint node or backup node in a HA cluster.Standby NameNode already do so.