Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Datanode missed block under replication configuration

Datanode missed block under replication configuration

New Contributor

Hi,

I was trying to figure out the default time interval for data-node to be complete the under replication blocks after failure. In fact, 2 blocks successfully placed on 2 different data-nodes but 3rd data-node was down and now when it comes online, It is taking 5 minutes to complete that missing block to be placed. I want to minimize this time interval. Is there any configuration property that can do this ?

7 REPLIES 7

Re: Datanode missed block under replication configuration

Check following link if it's help:-

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml. In this link you will found the all parameter's.

Re: Datanode missed block under replication configuration

New Contributor
@Ashnee Sharma

Thanks for the response but I have already visited this link but couldn't see the property that helps me, That is why I have asked it here if anybody can help me with this

Re: Datanode missed block under replication configuration

@ Viraj Vekaria

I don't think right now there is any property to minimize the interval time.

I think you can try to add MTU=9000. Maximum transmission unit.

This value indicates the size which can be sent in a packet/frame over TCP. By default MTU is set to 1500 and you can tune it have its value=9000, when value of MTU is greater than its default value then it’s called as Jumbo Frames.

Please follow the link

https://community.hortonworks.com/articles/8563/typical-hdp-cluster-network-configuration-best-pra.h...

Re: Datanode missed block under replication configuration

New Contributor

@Neeraj Sabharwal : Can you please help me here ?

Highlighted

Re: Datanode missed block under replication configuration

You can force the "replication" of under replicated blocks by issuing the setrep command on the file/directory. I use this technique to excellerate under-replicated blocks before an upgrade attempt to get to an optimal state.

Otherwise, you're at the mercy of the namenode to schedule the process.

Re: Datanode missed block under replication configuration

New Contributor

@David Streever : Well that's not a good information for me :( As I have been trying to reduce this time namenode is taking to replicate the blocks on datanode that was offline during the file upload. So you mean there is no such configuration so that this can be achieved ?

2) I have noticed that even if I have set the dfs.replication factor to 2, Ambari still considers the replication as 3.

Whatever the blocks are placed on both the data nodes, those all added blocks are displayed as a under replication blocks in ambari UI.

I am very confused on this 2 points.

Re: Datanode missed block under replication configuration

The dfs.replication.factor is applied to a folder at the time of folder creation. In your case, the folder has that setting already, regardless of what you set in Ambari. You need to "reset" it for the directory. IE: 'hdfs dfs -setrep -R <dir>'

Don't have an account?
Coming from Hortonworks? Activate your account here