Support Questions

fil · ‎11-11-2015

Hi dear experts!

i'm wondering is there any way to force block redistridution for some particular file/directory.

my case is:

1) load file from node that have DataNode process with replication factor 1

2) increace replication factor by executing: hdfs dfs -setrep 3 /tmp/path/to/my/file

3) check distribution with some specific Java tool:

hadoop jar FileDistribution.jar /tmp/path/to/my/file

and got:
-----------------------------------
-----------------------------------
Files distribution in directory across cluster is : {scaj31bda05.us.oracle.com=400, scaj31bda03.us.oracle.com=183, scaj31bda04.us.oracle.com=156, scaj31bda01.us.oracle.com=151, scaj31bda02.us.oracle.com=154, scaj31bda06.us.oracle.com=156}

it's obvious that first node contain 400 blocks. other 400*2=800 blocks evenly distributed across other nodes.

it there any way for force block redistribution for make it even?

thanks!

nitin · ‎11-25-2015

When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.

View solution in original post

Harsh J · ‎11-14-2015

This is an expected side-effect of loading data from a DN host. While there's no 'even distribution' tool today, you can perhaps try to get a more random effect going by raising the replication factor (to 4 or 5) and then lowering it back again.

nitin · ‎11-25-2015

When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.

Cloudera Community

Support Questions

Force block redistribution for some particular file or directory