Same issue occurs with Cloudera Manager 6.3.3 with CDH 6.3.3. When few of datanodes/region servers are decomissioned/forced to offline state, "toggle balancer" steps fails with this error. This temporary file never gets created.
It's Andor from Cloudera Support, let me provide some guidance around this. First of all, let me clarify that I quickly tested out this behavior on CDH 6.3 and selected one of the worker nodes from a 5 noded cluster through: CM > All Hosts > selected the worker node by ticking in its box > Actions for Selected (1) > Begin Maintenance (Supress Alerts/Decommission) > kept the "decommission host(s)" on, and selected take DataNode offline (screenshot attached).
It worked ok, gone over the point at which your try failed. To let us target the issue at your environment, could you share more about the deployment: CDH version, CM version used, number of worker nodes in the cluster, how much data is has (running $ hbase hbck -details command would show number of regions). By default the HBase balancer switch should be turned on, to let HMaster assign and re-assign HBase regions as needed if any RS would be going down or behaving slower than the others. In my test case, the balancer switch was on, so it could be turned off naturally.
Could you share these diagnostic informations & share if the balancer switch would be manually turned off earlier? Or is it possible that no other RS was available at time of running the command, to which HMaster could move the regions away?