Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to fix under replicated blocks fasly its take long time

How to fix under replicated blocks fasly its take long time

Contributor

i execute the cmd hadoop dfs -setrep -R -w 3 /

it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem

2 REPLIES 2
Highlighted

Re: How to fix under replicated blocks fasly its take long time

Expert Contributor

Hi @sivasaravanakumar k,

The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.

Some properties controlling this are dfs.namenode.replication.work.multiplier.per.iteration, dfs.namenode.replication.max-streams and dfs.namenode.replication.max-streams-hard-limit. The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

/Best regards, Mats

Highlighted

Re: How to fix under replicated blocks fasly its take long time

Contributor

Hi @Mats Johansson

i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .

after i ass new node i got

	WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks

so i am remove the corrupte file in my cluster

after i excute hdfs fsck / heal

The filesystem under path '/' is HEALTHY

change good

but

Under-replicated blocks:       1572982 (95.59069 %)

Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second

hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days

i want accuthe fille and balance replication for the data node

dfs.namenode.replication.work.multiplier.per.iteration 2

i dont hv below peroberty

dfs.namenode.replication.max-streams

dfs.namenode.replication.max-streams-hard-limit

i am using hadoop 1.x serice

what is the best way to balance my cluster

Don't have an account?
Coming from Hortonworks? Activate your account here