Created 10-04-2016 08:30 AM
Hi,
I was trying to figure out the default time interval for data-node to be complete the under replication blocks after failure. In fact, 2 blocks successfully placed on 2 different data-nodes but 3rd data-node was down and now when it comes online, It is taking 5 minutes to complete that missing block to be placed. I want to minimize this time interval. Is there any configuration property that can do this ?
Created 10-04-2016 10:36 AM
Check following link if it's help:-
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml. In this link you will found the all parameter's.
Created 10-05-2016 05:15 AM
Thanks for the response but I have already visited this link but couldn't see the property that helps me, That is why I have asked it here if anybody can help me with this
Created 10-05-2016 04:13 PM
@ Viraj Vekaria
I don't think right now there is any property to minimize the interval time.
I think you can try to add MTU=9000. Maximum transmission unit.
This value indicates the size which can be sent in a packet/frame over TCP. By default MTU is set to 1500 and you can tune it have its value=9000, when value of MTU is greater than its default value then it’s called as Jumbo Frames.
Please follow the link
Created 10-05-2016 05:19 AM
@Neeraj Sabharwal : Can you please help me here ?
Created 10-07-2016 03:23 PM
You can force the "replication" of under replicated blocks by issuing the setrep command on the file/directory. I use this technique to excellerate under-replicated blocks before an upgrade attempt to get to an optimal state.
Otherwise, you're at the mercy of the namenode to schedule the process.
Created 10-10-2016 06:48 AM
@David Streever : Well that's not a good information for me :( As I have been trying to reduce this time namenode is taking to replicate the blocks on datanode that was offline during the file upload. So you mean there is no such configuration so that this can be achieved ?
2) I have noticed that even if I have set the dfs.replication factor to 2, Ambari still considers the replication as 3.
Whatever the blocks are placed on both the data nodes, those all added blocks are displayed as a under replication blocks in ambari UI.
I am very confused on this 2 points.
Created 10-10-2016 12:38 PM
The dfs.replication.factor is applied to a folder at the time of folder creation. In your case, the folder has that setting already, regardless of what you set in Ambari. You need to "reset" it for the directory. IE: 'hdfs dfs -setrep -R <dir>'