Support Questions

Find answers, ask questions, and share your expertise

HDFS Under-Replicated Blocks (High Number)

avatar
Contributor

I have run the fsck command on my HDFS and I am seeing a high number of Under-replicated blocks (over 30%)!!!

My HDFS Replication Factor is set up to 2!

What are the Best Practices / Recommended Methods to 'fix' this issue??

1) Should I se "hadoop fs -setrep" to change the replication factor of certain files?

2) What's the manual way to 'force' the affected blocks to replicate themselves?

3) Should I remove permanetly certain types of files?

    For instance, in the fsch log report I am seeing a lot of files with of this type:

/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446320640/p1=p1/p2=421 <dir>
/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446620772 <dir>
/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446620772/p1=p0 <dir>

4) How about the /tmp/logs/ files? Dp I reset their setrep setting or periodically remove them?

5) I am also having quite a few Accumulo tables reporting under-replicated blocks!

 

 

 

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi TS, are you still facing this issue too? Have you changed back to 3 replicates? Or still configured with 2?

 

1) Should I se "hadoop fs -setrep" to change the replication factor of certain files?

JMS: No. Keep it the way it is for now.

 

2) What's the manual way to 'force' the affected blocks to replicate themselves?

JMS: It depends... If they are configure to replicate 100 times, you might not have enought nodes and you can not force that. How many nodes do you have in your cluster? Car you past here part of the fsck output?

 

3) Should I remove permanetly certain types of files?

    For instance, in the fsch log report I am seeing a lot of files with of this type:

/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446320640/p1=p1/p2=421 <dir>
/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446620772 <dir>
/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446620772/p1=p0 <dir>

JMS: This is the trash. If you don't need those file, clean the trash?

 

4) How about the /tmp/logs/ files? Dp I reset their setrep setting or periodically remove them?

JMS: Same thing. Temporary files. Can you list them to make sure? You might be able to delete them.

 

5) I am also having quite a few Accumulo tables reporting under-replicated blocks!

 JMS: Here again, please paste the logs here. This one is the most concerning. They should have the default, except if accumulo set that to more than the factor 2 you have.

 

JMS

View solution in original post

8 REPLIES 8

avatar
Rising Star

Hi TS, are you still facing this issue too? Have you changed back to 3 replicates? Or still configured with 2?

 

1) Should I se "hadoop fs -setrep" to change the replication factor of certain files?

JMS: No. Keep it the way it is for now.

 

2) What's the manual way to 'force' the affected blocks to replicate themselves?

JMS: It depends... If they are configure to replicate 100 times, you might not have enought nodes and you can not force that. How many nodes do you have in your cluster? Car you past here part of the fsck output?

 

3) Should I remove permanetly certain types of files?

    For instance, in the fsch log report I am seeing a lot of files with of this type:

/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446320640/p1=p1/p2=421 <dir>
/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446620772 <dir>
/user/hue/.Trash/150507010000/user/hue/.cloudera_manager_hive_metastore_canary/hive0_hms/cm_test_table1430446620772/p1=p0 <dir>

JMS: This is the trash. If you don't need those file, clean the trash?

 

4) How about the /tmp/logs/ files? Dp I reset their setrep setting or periodically remove them?

JMS: Same thing. Temporary files. Can you list them to make sure? You might be able to delete them.

 

5) I am also having quite a few Accumulo tables reporting under-replicated blocks!

 JMS: Here again, please paste the logs here. This one is the most concerning. They should have the default, except if accumulo set that to more than the factor 2 you have.

 

JMS

avatar
Contributor

Hi JM, thank you again!

 

The issue (under-replicated & corrupt blocks) started when I added 2 new nodes into an existing CDH 5.4 cluster.

I went and selectively removed and restored files back into HDFS.

HDFS now is HEALTHY.

 

However, I haven't pinpointed the root cause!

I have opened up another thread listing more details about the corrputed blocks issues.

I'll close this one amd continue with the other one.

 

Thanks for all your help.

 

Happy Mother's Day 🙂

 

avatar
Contributor

Hi TS,

 

How come adding the nodes made blocks under-replicated?

 

Did you run the balancer?

avatar
Rising Star

Hi Siddesh,

 

Adding a note is not related to the under-replicated blocks.

 

under-replicated blocks where most probably already there before the new node got added.

 

By default, for the JARs, MR set the replication to 10. So if there is less than 10 nodes in the cluster you will most always have under replicated blocks.

 

You should check with fsck to see what is missing. Balancer will not help for that.

 

JM

avatar
Contributor

I may have missed something above but JAR set the replication to 10? Where is that mentioned in the post?

avatar
Rising Star

Hi Siddesh,

 

This is not mention, but it's a potential cause of the under replicated blocks. Can also be anything else.

 

JMS

avatar
Contributor

Well absolutely. He needs to check his config files.

avatar
Master Collaborator

Hi,

 

When i Run fsck on my cluster i got that several blocks under replicated and the target replication is 3 even i changed the dfs.replication to NN/ DNs and client server to replication factor 2, and mapred.submit.replication changed to 2.

tried also:

<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>

I also restarted all service at my cluster including the oozie.

Looking at one of the running jobs conf and see the following with replication factor 3:

mapreduce.client.submit.file.replication
s3.replication
kfs.replication
dfs.namenode.replication.interval
ftp.replication
s3native.replication