Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to recover missing blocks of hdfs after delete a data dir in the datanode by a mistake

avatar
Explorer

hi:

   The datanode has three dir to store the data such as /data/1,/data/2, /data/3. I delete the /data/1  in the datanode by mistake. Then the hdfs shows missing blocks.  I copy the data from /data/3 to /data/1 , but it didn't work.

 

Thanks for regard

leezy

1 ACCEPTED SOLUTION

avatar
Master Collaborator

From the context I'm assuming you have setup a 1 node test cluster?

 

HDFS replicates data between different nodes, the /data/1, /data/2, and /data/3 are just different drives. HDFS will use each of those drives to store blocks, and will replicate those blocks to other nodes in the cluster. 

 

by Deleting /data/1 deleted the blocks on that drive. /data/2 or /data/3 won't have those blocks. If you have more than 1 node, HDFS will replicate a copy of the blocks that were stored on /data/1 to one of those other drives, likely spread out among all the available drives on that node.  when /data/1 was deleted in that case, HDFS will detect those blocks went missing the next time the datanode checks in and start automatically repairing the under-replicated blocks.

 

Missing blocks implies that the only copy of the block has gone missing, so in that case the only way to recover them would have been to do drive recovery operations on that drive. This will be the case in single node test clusters, thus the assumption above. 

View solution in original post

1 REPLY 1

avatar
Master Collaborator

From the context I'm assuming you have setup a 1 node test cluster?

 

HDFS replicates data between different nodes, the /data/1, /data/2, and /data/3 are just different drives. HDFS will use each of those drives to store blocks, and will replicate those blocks to other nodes in the cluster. 

 

by Deleting /data/1 deleted the blocks on that drive. /data/2 or /data/3 won't have those blocks. If you have more than 1 node, HDFS will replicate a copy of the blocks that were stored on /data/1 to one of those other drives, likely spread out among all the available drives on that node.  when /data/1 was deleted in that case, HDFS will detect those blocks went missing the next time the datanode checks in and start automatically repairing the under-replicated blocks.

 

Missing blocks implies that the only copy of the block has gone missing, so in that case the only way to recover them would have been to do drive recovery operations on that drive. This will be the case in single node test clusters, thus the assumption above.