Bug --> BUG-45960
Search for BUG-45960 once you click the following link
@Neeraj Sabharwal: Thanks a lot for your help. I read jira and bug doc but I noticed it was an issue in ambari 2.1.2 and resolved in ambari 2.2.0. And I have ambari 2.2.0 only where I am facing this error.
The timeout period for the rebalance operation is wired in the stack definition ( the value used here is the one set for the namenode command script, and is set to 1800s.) When issuing the rebalance process this value is increased by the server by 10 more minutes (to give the chance for the operation to end)
If the rebalance operation takes longer, the server times out the operation by killing the process (however some resources created by the rebalancer will remain on the HDFS) and eventually retries to execute the command. When the command is issued again (regardless if it's retried by the server or triggered manually) the rebalancer will notice that another balancer is running (because it finds the /system/balancer.id on the HDFS).
The problem is addressed here: https://issues.apache.org/jira/browse/AMBARI-20175
The only solution till the fix is available would be to update the timeout in the stack definition to a "big enough" value so that the timeout is disabled, or it's long enough not to timeout the rebalancing. (The ambari server needs to be restarted for this to take effect; also all namenode operation timeouts will be set to the new value)