Member since
05-16-2018
1
Post
0
Kudos Received
0
Solutions
01-03-2019
01:25 PM
1 Kudo
Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them. We had a system with 2 namenodes with high availability enabled. For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855). There was one folder containing the IP of namenode1 and another containing the IP of namenode 2. All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder. I don't know how we got to the point of having 2 block pools folders, but we did. In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. We then restarted HDFS and made sure all the blocks are
reported and there's no more missing blocks.
... View more