Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to fix under replicated blocks fasly its take long time

avatar
Expert Contributor

i execute the cmd hadoop dfs -setrep -R -w 3 /

it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem

2 REPLIES 2

avatar
Super Collaborator

Hi @sivasaravanakumar k,

The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.

Some properties controlling this are dfs.namenode.replication.work.multiplier.per.iteration, dfs.namenode.replication.max-streams and dfs.namenode.replication.max-streams-hard-limit. The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

/Best regards, Mats

avatar
Expert Contributor

Hi @Mats Johansson

i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .

after i ass new node i got

	WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks

so i am remove the corrupte file in my cluster

after i excute hdfs fsck / heal

The filesystem under path '/' is HEALTHY

change good

but

Under-replicated blocks:       1572982 (95.59069 %)

Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second

hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days

i want accuthe fille and balance replication for the data node

dfs.namenode.replication.work.multiplier.per.iteration 2

i dont hv below peroberty

dfs.namenode.replication.max-streams

dfs.namenode.replication.max-streams-hard-limit

i am using hadoop 1.x serice

what is the best way to balance my cluster