Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

"Rebalance HDFS" - Executing from Ambari UI in a production cluster with running jobs?

avatar
New Contributor

Hi I am newbie to Hadoop and Big data. I am curious to know if I run "Rebalance HDFS" from Ambari UI, in production cluster, is there any negative impact of that on running jobs? Do I need to take care of anything before doing that? Thanks.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Husnain Bustam

Yes. Running balancer will start moving blocks from nodes where you have higher number of blocks to nodes which have less number of blocks. This depends on a number of factors. For example, you likely have balancing threshold set to 10% which means blocks can be distributed within the cluster within 10% of each other (one node has 10 blocks and other 9 or 11 would be acceptable - no need to balance further).

When balancer runs, it takes up network resources. You want to make sure you are not doing it when you have heavy load as those jobs will be affected. There are also checks against moving blocks as you may have current jobs using them.

check for following settings and how to change them for your needs.

dfs.balance.bandwidthPerSec (network bandwidth you want to assign to balancer)

dfs.datanode.balance.max.concurrent.moves (how many blocks you want to move concurrently).

You should check following thread. The accepted answer has good suggestion on how to start and have good performance.

https://community.hortonworks.com/questions/49959/even-when-i-ran-balancer-load-one-data-node-is-84....

View solution in original post

2 REPLIES 2

avatar
Super Guru

@Husnain Bustam

Yes. Running balancer will start moving blocks from nodes where you have higher number of blocks to nodes which have less number of blocks. This depends on a number of factors. For example, you likely have balancing threshold set to 10% which means blocks can be distributed within the cluster within 10% of each other (one node has 10 blocks and other 9 or 11 would be acceptable - no need to balance further).

When balancer runs, it takes up network resources. You want to make sure you are not doing it when you have heavy load as those jobs will be affected. There are also checks against moving blocks as you may have current jobs using them.

check for following settings and how to change them for your needs.

dfs.balance.bandwidthPerSec (network bandwidth you want to assign to balancer)

dfs.datanode.balance.max.concurrent.moves (how many blocks you want to move concurrently).

You should check following thread. The accepted answer has good suggestion on how to start and have good performance.

https://community.hortonworks.com/questions/49959/even-when-i-ran-balancer-load-one-data-node-is-84....

avatar
Master Guru

@Husnain Bustam

It is safe to run the balancer while other jobs are running if you have default value of dfs.datanode.balance.bandwidthPerSec=1048576 Bytes per second. It is recommended that run the balancer periodically ( may be once per week ) when there is less load on your cluster ( preferably on Weekends ).

It is also safe to kill running balancer anytime if it is causing any impact on running jobs etc.

Below are some helpful links:

https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html

https://community.hortonworks.com/articles/26518/hadoop-cluster-maintenance.html

https://community.hortonworks.com/articles/43615/hdfs-balancer-1-100x-performance-improvement.html