Support Questions
Find answers, ask questions, and share your expertise

hdfs rebalncing

Highlighted

hdfs rebalncing

Hi,

 

is there anyway to run the balancer automatically everyday.

3 REPLIES 3
Highlighted

Re: hdfs rebalncing

Master Guru
At present there is no inbuilt way to do this.

You can however, regularly schedule a balancer command as a Linux CRON
job, to run daily at a preferred time.
Or, if you use CM, you can trigger its balancer actions via the REST
API documented at
http://cloudera.github.io/cm_api/apidocs/v7/path__clusters_-clusterName-_services_-serviceName-_comm...

Highlighted

Re: hdfs rebalncing

here is my scenario:

 

i have three 3 hadoop boxes and i have no replication,

 

when the data is comming to hdfs tha toatal datat is stored into box1 event htough theres is a space in 2 and 3 boxes.

 

please help me that is this is the issue causing by replication?

 

By default the frame work will store the data in round robin mannaer,right?

 

 

 

Re: hdfs rebalncing

Master Guru
The default policy of a HDFS client is to store its first block
replica on the same node as the client, if the client host is also
serving a DataNode. This is for optimisation of writes.

If you want a more randomized write, write from outside of the cluster
nodes (i.e. from a host that is not a DataNode).

Don't have an account?