Created 07-04-2017 08:04 AM
Hello community!
I recently added 4 more DNs to my Hadoop cluster, now there are 46 DNs up and running.
I'm running the balanncer since 5 days and today a message appears in the top of the NameNode web console (my_name_node_url:50070) about "There are 1169 missing blocks. The following files may be corrupted:" and a list of some of this corrupted blocks.
After I saw that message I decided to run the command "hdfs dfsadmin -report" and the result was:
Configured Capacity: 1706034579673088 (1.52 PB)
Present Capacity: 1683943231506526 (1.50 PB)
DFS Remaining: 559797934331658 (509.13 TB)
DFS Used: 1124145297174868 (1022.40 TB)
DFS Used%: 66.76%
Under replicated blocks: 1169
Blocks with corrupt replicas: 0
Missing blocks: 1169
Missing blocks (with replication factor 1): 21062
For storage capacity reasons a group of developers decided to avoid my advice and set the replication factor in 1 for some files.
What does "Missing blocks: 1169" means?
Is "Missing blocks (with replication factor 1)" message telling that those 21062 blocks from files with replication factor 1 cannot be recover?
I'll be very grateful if anyone can clarify this concept.
Thanks!
Guido.
Created 07-21-2017 10:30 AM
Hello!
I tracked the missing blocks and fortunately they belonged to a decommisioned DN so I decided to remove them.
That's it!
Thanks for your help!
Guido.
Created 07-04-2017 06:03 PM
Created on 07-05-2017 06:02 AM - edited 07-05-2017 06:29 AM
Thanks @mbigelow for clarifying this.
I've run the command (hdfs dfsadmin -report) as I did yesterday an the output is the same.
Configured Capacity: 1706034579673088 (1.52 PB)
Present Capacity: 1683460189648203 (1.50 PB)
DFS Remaining: 558365599904940 (507.83 TB)
DFS Used: 1125094589743263 (1023.27 TB)
DFS Used%: 66.83%
Under replicated blocks: 1169
Blocks with corrupt replicas: 0
Missing blocks: 1169
Missing blocks (with replication factor 1): 21062
I've a couple of questions that maybe you can help me with.
1) Is there a way to get rid of that message in the NameNode web console?
2) There is a way to find out/list the missing files instead of the missing blocks?
3) Under replicated blocks staying steady in 1169, is CDH supposet to handle this?
An important thing I forgot to mention is that HBase is present in the cluster an there are 5 region servers, maybe this question fit better in a new post but as far as I know HBase and HDFS balancer don't like each other so I'm wondering if this situation can be the reason why CDH is not replicating the under replicated blocks.
Thanks again!
Guido.
Created 07-05-2017 07:52 PM
Created on 07-06-2017 12:11 PM - edited 07-06-2017 12:25 PM
Thanks @mbigelow!
I took a deep dive into those corrupt blocks and I realized that dont' belong to HBase tables, are just files from the HDFS.
I think I can understand more or less what is happening, please feel free to correct me in case I am wrong.
1) Under replicated blocks: 1169
2) Blocks with corrupt replicas: 0
3) Missing blocks: 1169
4) Missing blocks (with replication factor 1): 21062
1) I run "hdfs fsck / -list-corruptfileblocks" in order to find what files these blocks belong to. Then I listed those files and all of them had the replication factor in 1.
The replication factor by default in the cluster is 3 so no matter how much time I wait for HDFS to autommatically handle these under replicated blocks, they always be listed as under replicated. Am I wright? The cluster has a lots of files with replication factor in 1 too but were not listed as "under replicated", I can't understand why.
2) Nothing to agree.
3) Are missing from the entire cluster, are dead and there's no way to give them back without a backup. These blocks are the same as the under replicated ones in 1), my question here is...Why these files are not in "Missing blocks (with replication factor 1)"? or maybe they are but in this case why are there no more "under replicated" blocks?
4) No much to agree, clarifying 3) I'll better understand this.
Thanks again!
Created 07-21-2017 10:30 AM
Hello!
I tracked the missing blocks and fortunately they belonged to a decommisioned DN so I decided to remove them.
That's it!
Thanks for your help!
Guido.