Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Force block redistribution for some particular file or directory

SOLVED Go to solution

Force block redistribution for some particular file or directory

Rising Star

Hi dear experts!

 

i'm wondering is there any way to force block redistridution for some particular file/directory.

my case is:

1) load file from node that have DataNode process with replication factor 1

2) increace replication factor by executing: hdfs dfs -setrep 3 /tmp/path/to/my/file

3) check distribution with some specific Java tool:

hadoop jar FileDistribution.jar /tmp/path/to/my/file

 

and got:
-----------------------------------
-----------------------------------
Files distribution in directory across cluster is : {scaj31bda05.us.oracle.com=400, scaj31bda03.us.oracle.com=183, scaj31bda04.us.oracle.com=156, scaj31bda01.us.oracle.com=151, scaj31bda02.us.oracle.com=154, scaj31bda06.us.oracle.com=156}

 

it's obvious that first node contain 400 blocks. other 400*2=800 blocks evenly distributed across other nodes.

it there any way for force block redistribution for make it even?

 

thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Force block redistribution for some particular file or directory

Cloudera Employee
When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.
2 REPLIES 2

Re: Force block redistribution for some particular file or directory

Master Guru
This is an expected side-effect of loading data from a DN host. While there's no 'even distribution' tool today, you can perhaps try to get a more random effect going by raising the replication factor (to 4 or 5) and then lowering it back again.
Highlighted

Re: Force block redistribution for some particular file or directory

Cloudera Employee
When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.