Support Questions

Find answers, ask questions, and share your expertise

Rebalancer fails with "No block has been moved for 5 iterations. Exiting..."

avatar
Contributor

Hi, 

 

the dfs directories on the data disks on our cluster got unevenly distribured, which I confirmed with hdfs dfsadmin -report. One datanode has DFS Used%: 60.20% while the rest has DFS Used%: 36.32%. All datanodes are in the same default rack. We use 5.10.1-1.cdh5.10.1.p0.10 with kerberized cluster. 

 

However when I run the rebalancer, both from the Cloudera UI and from command line it starts normaly but fails within seconds to few minutes with the following error:

 

 

Thu Sep 14 12:39:37 CEST 2017
Current working directory: /run/cloudera-scm-agent/process/5092-hdfs-BALANCER
Launching one-off process: /usr/lib64/cmf/service/hdfs/hdfs.sh balancer -threshold 10.0 -policy DataNode
Thu Sep 14 12:39:37 CEST 2017
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
using /usr/java/jdk1.7.0_67-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/5092-hdfs-BALANCER as CONF_DIR
using  as SECURE_USER
using  as SECURE_GROUP
CONF_DIR=/run/cloudera-scm-agent/process/5092-hdfs-BALANCER
CMF_CONF_DIR=/etc/cloudera-scm-agent
unlimited
/bin/kinit
using hdfs/hadoop-master01.example.net@EXAMPLE.NET as Kerberos principal
using /run/cloudera-scm-agent/process/5092-hdfs-BALANCER/krb5cc_994 as Kerberos ticket cache
2017-09-14 12:39:39,707 INFO  [main] balancer.Balancer (Balancer.java:parse(829)) - Using a threshold of 10.0
2017-09-14 12:39:39,710 INFO  [main] balancer.Balancer (Balancer.java:run(644)) - namenodes  = [hdfs://nameservice1]
2017-09-14 12:39:39,712 INFO  [main] balancer.Balancer (Balancer.java:run(645)) - parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false]
2017-09-14 12:39:39,712 INFO  [main] balancer.Balancer (Balancer.java:run(646)) - included nodes = []
2017-09-14 12:39:39,713 INFO  [main] balancer.Balancer (Balancer.java:run(647)) - excluded nodes = []
2017-09-14 12:39:39,713 INFO  [main] balancer.Balancer (Balancer.java:run(648)) - source nodes = []
2017-09-14 12:39:39,713 INFO  [main] balancer.Balancer (Balancer.java:checkKeytabAndInit(694)) - Keytab is configured, will login using keytab.
2017-09-14 12:39:39,906 INFO  [main] security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(1138)) - Login successful for user hdfs/hadoop-master01.example.net@EXAMPLE.NET using keytab file hdfs.keytab
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
2017-09-14 12:39:41,078 INFO  [main] balancer.KeyManager (KeyManager.java:<init>(68)) - Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
2017-09-14 12:39:41,084 INFO  [main] block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys
2017-09-14 12:39:41,086 INFO  [main] balancer.KeyManager (KeyManager.java:<init>(142)) - Update block keys every 2hrs, 30mins, 0sec
2017-09-14 12:39:41,334 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:39:41,334 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:39:41,335 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:39:41,335 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:39:41,336 INFO  [org.apache.hadoop.hdfs.server.balancer.KeyManager$BlockKeyUpdater@6e6b28b4] block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys
2017-09-14 12:39:41,344 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:39:41,365 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:39:41,365 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:39:41,365 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:39:41,367 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:39:41,367 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:39:41,369 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:39:41,387 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:39:41,387 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:39:41,389 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
2017-09-14 12:39:41,554 INFO  [pool-4-thread-1] balancer.Dispatcher (Dispatcher.java:dispatch(289)) - Start moving blk_1074640031_900008 with size=74 from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK through 10.10.10.212:1004
2017-09-14 12:39:41,569 INFO  [pool-4-thread-1] balancer.Dispatcher (Dispatcher.java:dispatch(325)) - Successfully moved blk_1074640031_900008 with size=74 from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK through 10.10.10.212:1004
Sep 14, 2017 12:39:41 PM          0                 74 B           624.55 GB              10 GB
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:39:50,592 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:39:50,596 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:39:50,596 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:39:50,597 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:39:50,597 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:39:50,597 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:39:50,598 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:39:50,602 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:39:50,602 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:39:50 PM          1                 74 B           624.55 GB              10 GB
2017-09-14 12:39:59,725 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:39:59,725 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:39:59,726 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:39:59,726 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:39:59,726 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:39:59,730 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:39:59,730 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:39:59,731 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:39:59,731 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:39:59,731 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:39:59,732 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:39:59,735 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:39:59,735 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:39:59 PM          2                 74 B           624.55 GB              10 GB
2017-09-14 12:40:08,818 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:40:08,818 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:40:08,818 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:40:08,819 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:40:08,819 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:40:08,822 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:40:08,823 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:40:08,823 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:40:08,824 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:40:08,824 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:40:08,824 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:40:08,827 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:40:08,827 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:40:08 PM          3                 74 B           624.55 GB              10 GB
2017-09-14 12:40:17,929 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:40:17,930 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:40:17,930 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:40:17,930 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:40:17,931 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:40:17,934 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:40:17,934 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:40:17,935 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:40:17,935 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:40:17,935 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:40:17,936 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:40:17,939 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:40:17,939 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:40:17,939 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:40:18 PM          4                 74 B           624.55 GB              10 GB
2017-09-14 12:40:27,031 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:40:27,037 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:40:27,037 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:40:27,037 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:40:27,038 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:40:27,038 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:40:27,038 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:40:27,043 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:40:27,043 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
No block has been moved for 5 iterations. Exiting...
Sep 14, 2017 12:40:27 PM          5                 74 B           624.55 GB              10 GB
Sep 14, 2017 12:40:27 PM Balancing took 48.137 seconds
Exit code: 253

I tried adjsuting the balancer settings by increasing and decreasing values in Cloudera UI with no avail.

 

Note that there are also 3 datanodes in total with replication factor of 3. Could this be preventing the balancer from finding a node to place the blocks withouht breaking the replication factor?

 

1 ACCEPTED SOLUTION

avatar
Contributor

As suspected there were no available datanodes to place replicas to as I had default replication factor of 3 and 3 datanodes in total. 

 

The balancer started working fine after adding a fourth datanode to the cluster.

View solution in original post

3 REPLIES 3

avatar
Contributor

Hi,

 

Try running mannualy (as HDFS user):

 

hdfs balancer -threshold 5

HDFS balancer skips tiny blocks, check if this is your case. --> JIRA HDFS-8824

 

Regards, 
Marc Casajús

avatar
Contributor

Still the same. I don't think that changing the threshold will have any effect.

avatar
Contributor

As suspected there were no available datanodes to place replicas to as I had default replication factor of 3 and 3 datanodes in total. 

 

The balancer started working fine after adding a fourth datanode to the cluster.