Created 09-22-2016 05:11 AM
i execute the cmd hadoop dfs -setrep -R -w 3 /
it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem
Created 09-22-2016 02:54 PM
The rate of replication work is throttled by HDFS to not interfere with cluster traffic when failures happen during regular cluster load.
Some properties controlling this are dfs.namenode.replication.work.multiplier.per.iteration
, dfs.namenode.replication.max-streams
and dfs.namenode.replication.max-streams-hard-limit
. The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
/Best regards, Mats
Created 09-22-2016 03:19 PM
Hi @Mats Johansson
i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .
after i ass new node i got
WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks
so i am remove the corrupte file in my cluster
after i excute hdfs fsck / heal
The filesystem under path '/' is HEALTHY
change good
but
Under-replicated blocks: 1572982 (95.59069 %)
Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second
hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days
i want accuthe fille and balance replication for the data node
dfs.namenode.replication.work.multiplier.per.iteration 2
i dont hv below peroberty
dfs.namenode.replication.max-streams
dfs.namenode.replication.max-streams-hard-limit
i am using hadoop 1.x serice
what is the best way to balance my cluster