Created on 11-18-2024 09:45 PM - edited 11-18-2024 10:31 PM
I have executed the command hdfs namenode -reformat on my hadoop cluster and the namenode data files have newly generated VERSION file.
This created inconsistency as the datanode and namenode VERSION files have different cluster-ID.
I manually changed the cluster-id in the datanode VERSION file to match the namenode one. This resulted in succesful start of the cluster but after the start there is a new block create with new blockpoolID which is empy. All my data is in the old datafile with old blockpoolID but it is currently not mapped to my hadoop filestystem.
How can I recover my old data and make it accessible in my hadoop cluster again?
Can I somehow just start over with my old datafile from datanode?
Created 11-20-2024 11:11 PM
@UrosCvijanovic The initializeSharedEdits will only try to format the journalnodes. Specifically, we expect /hdfs/journalNode to be empty for it to succeed. Once this is done, the Namenode will sync the edits with the 3 journalnodes.
Note : Do take a backup of Namenode metadata before this to be on a safer side.
Are you able to list the files present in HDFS on the single node machine?
Created 11-18-2024 11:32 PM
@UrosCvijanovic, Welcome to our community! To help you get the best possible answer, I have tagged our HDFS experts @rki_ @willx who may be able to assist you further.
Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 11-18-2024 11:50 PM
Thank you @VidyaSargur! Hope to hear from them soon!
Created 11-19-2024 02:45 AM
@UrosCvijanovic When you reformatted the NameNode, its metadata (e.g., fsimage and edits files) was reset, and a new VERSION file was created.
If you have a backup of the original NameNode directory before the format, then you can copy it to the current Namenode data dir.
The clusterID and the blockPool ID should match in the Version file of both the Namenode and Datanode.
Created 11-19-2024 03:13 AM
@rki_ Is there any way to recover the data if the NameNode was not backed up. I had a 3 node cluster. On each machine I now have a datafile with 11G of data on it:
11G ./BP-1475975986-127.0.0.1-1447139386120/current/finalized/subdir0
I want to save the data if possible, is there a way to start a new single node cluster and then map this block data with the new cluster or I can assume my data is lost?
Basically what Im asking is when the namenode metadata is modified by the hdfs reformat command, the datafile itself is unusable or there is a way to get it back?
Created 11-20-2024 06:10 AM
@UrosCvijanovic The Namenode holds the Metadata that maps the files to the blocks present in the Datanode. Without this metadata, the namenode won't be able to interpret which block belongs to which file even if we manage to report all the blocks to the Namenode.
Though the chances are very slim, you can try to start a new single node cluster and then map this block data with the new cluster. Copy the old data/current directory containing the block files into the new cluster’s DataNode storage directories.
Created on 11-20-2024 10:08 PM - edited 11-20-2024 10:10 PM
Thank you @rki_ seems like a takeaway is to never format namenode if you have data on the cluster.
Before starting the whole process of upgrading debian version of the machines in the cluster i copied all the data from master node into another single node hadoop cluster running on seperate machine.
Now I used that datafile to build a new single node cluster that is started succesfully. But now I want to extend the sinlge node cluster to be multi node cluster with 3 nodes.
I stopped the sinlge node cluster and edit all the configuration files on all three nodes following the setup I had previously. This is how the configuration files look like:
https://codefile.io/f/jsklFb1h0M
I also copied the /hdfs/dfs/name to the other two nodes without the /hdfs/dfs/data. I want to let the hadoop handle by auto replicating the data from the master node.
I can successfully start zookeper service:
#start the zookeeper /usr/local/zookeeper/bin/zkServer.sh start
And also start the journalnode:
#start journalnode hadoop/bin/hdfs --daemon start journalnode
The usual next step would be:
hadoop/bin/hdfs namenode -initializeSharedEdits
But that will prompt me with:
Re-format filesystem in QJM to [vm-ip1:8485, vm-ip2:8485, vm-ip3:8485] ? (Y or N)
I guess if I reformat the filesystem again i will lose the data on the master node again.
How can I start this multi node cluster and replicate the data on other two nodes succesfully with the described state?
Created 11-20-2024 11:11 PM
@UrosCvijanovic The initializeSharedEdits will only try to format the journalnodes. Specifically, we expect /hdfs/journalNode to be empty for it to succeed. Once this is done, the Namenode will sync the edits with the 3 journalnodes.
Note : Do take a backup of Namenode metadata before this to be on a safer side.
Are you able to list the files present in HDFS on the single node machine?
Created 11-21-2024 12:02 AM
Thank you @rki_ ! That is absolutely what happened. I had a node that the /tmp/ folder still contained old journalnode data. After cleaning it up and doing initializeSharedEdits i managed to start cluster.
Note: I had this exact exception on two slave nodes:
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: There appears to be a gap in the edit log. We expected txid 121994, but got txid 121998.
I did hdfs namenode -recover on both slave nodes and then was able to start both namenodes propely. The data is replicated within all 3 nodes.
Thank you so much for the help!