Support Questions

Find answers, ask questions, and share your expertise

How to fix under replicated blocks fasly its take long time

avatar
Expert Contributor

i execute the cmd hadoop dfs -setrep -R -w 3 /

it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem

2 REPLIES 2

avatar
Super Collaborator

Hi @sivasaravanakumar k,

The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.

Some properties controlling this are dfs.namenode.replication.work.multiplier.per.iteration, dfs.namenode.replication.max-streams and dfs.namenode.replication.max-streams-hard-limit. The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

/Best regards, Mats

avatar
Expert Contributor

Hi @Mats Johansson

i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .

after i ass new node i got

	WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks

so i am remove the corrupte file in my cluster

after i excute hdfs fsck / heal

The filesystem under path '/' is HEALTHY

change good

but

Under-replicated blocks:       1572982 (95.59069 %)

Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second

hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days

i want accuthe fille and balance replication for the data node

dfs.namenode.replication.work.multiplier.per.iteration 2

i dont hv below peroberty

dfs.namenode.replication.max-streams

dfs.namenode.replication.max-streams-hard-limit

i am using hadoop 1.x serice

what is the best way to balance my cluster