Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Rebalancer fails with "No block has been moved for 5 iterations. Exiting..."

avatar
Contributor

Hi, 

 

the dfs directories on the data disks on our cluster got unevenly distribured, which I confirmed with hdfs dfsadmin -report. One datanode has DFS Used%: 60.20% while the rest has DFS Used%: 36.32%. All datanodes are in the same default rack. We use 5.10.1-1.cdh5.10.1.p0.10 with kerberized cluster. 

 

However when I run the rebalancer, both from the Cloudera UI and from command line it starts normaly but fails within seconds to few minutes with the following error:

 

 

Thu Sep 14 12:39:37 CEST 2017
Current working directory: /run/cloudera-scm-agent/process/5092-hdfs-BALANCER
Launching one-off process: /usr/lib64/cmf/service/hdfs/hdfs.sh balancer -threshold 10.0 -policy DataNode
Thu Sep 14 12:39:37 CEST 2017
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
using /usr/java/jdk1.7.0_67-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/5092-hdfs-BALANCER as CONF_DIR
using  as SECURE_USER
using  as SECURE_GROUP
CONF_DIR=/run/cloudera-scm-agent/process/5092-hdfs-BALANCER
CMF_CONF_DIR=/etc/cloudera-scm-agent
unlimited
/bin/kinit
using hdfs/hadoop-master01.example.net@EXAMPLE.NET as Kerberos principal
using /run/cloudera-scm-agent/process/5092-hdfs-BALANCER/krb5cc_994 as Kerberos ticket cache
2017-09-14 12:39:39,707 INFO  [main] balancer.Balancer (Balancer.java:parse(829)) - Using a threshold of 10.0
2017-09-14 12:39:39,710 INFO  [main] balancer.Balancer (Balancer.java:run(644)) - namenodes  = [hdfs://nameservice1]
2017-09-14 12:39:39,712 INFO  [main] balancer.Balancer (Balancer.java:run(645)) - parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false]
2017-09-14 12:39:39,712 INFO  [main] balancer.Balancer (Balancer.java:run(646)) - included nodes = []
2017-09-14 12:39:39,713 INFO  [main] balancer.Balancer (Balancer.java:run(647)) - excluded nodes = []
2017-09-14 12:39:39,713 INFO  [main] balancer.Balancer (Balancer.java:run(648)) - source nodes = []
2017-09-14 12:39:39,713 INFO  [main] balancer.Balancer (Balancer.java:checkKeytabAndInit(694)) - Keytab is configured, will login using keytab.
2017-09-14 12:39:39,906 INFO  [main] security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(1138)) - Login successful for user hdfs/hadoop-master01.example.net@EXAMPLE.NET using keytab file hdfs.keytab
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
2017-09-14 12:39:41,078 INFO  [main] balancer.KeyManager (KeyManager.java:<init>(68)) - Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
2017-09-14 12:39:41,084 INFO  [main] block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys
2017-09-14 12:39:41,086 INFO  [main] balancer.KeyManager (KeyManager.java:<init>(142)) - Update block keys every 2hrs, 30mins, 0sec
2017-09-14 12:39:41,334 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:39:41,334 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:39:41,335 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:39:41,335 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:39:41,336 INFO  [org.apache.hadoop.hdfs.server.balancer.KeyManager$BlockKeyUpdater@6e6b28b4] block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys
2017-09-14 12:39:41,344 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:39:41,365 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:39:41,365 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:39:41,365 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:39:41,367 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:39:41,367 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:39:41,369 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:39:41,387 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:39:41,387 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:39:41,388 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:39:41,389 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
2017-09-14 12:39:41,554 INFO  [pool-4-thread-1] balancer.Dispatcher (Dispatcher.java:dispatch(289)) - Start moving blk_1074640031_900008 with size=74 from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK through 10.10.10.212:1004
2017-09-14 12:39:41,569 INFO  [pool-4-thread-1] balancer.Dispatcher (Dispatcher.java:dispatch(325)) - Successfully moved blk_1074640031_900008 with size=74 from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK through 10.10.10.212:1004
Sep 14, 2017 12:39:41 PM          0                 74 B           624.55 GB              10 GB
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:39:50,590 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:39:50,592 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:39:50,596 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:39:50,596 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:39:50,597 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:39:50,597 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:39:50,597 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:39:50,598 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:39:50,601 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:39:50,602 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:39:50,602 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:39:50 PM          1                 74 B           624.55 GB              10 GB
2017-09-14 12:39:59,725 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:39:59,725 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:39:59,726 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:39:59,726 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:39:59,726 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:39:59,730 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:39:59,730 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:39:59,731 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:39:59,731 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:39:59,731 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:39:59,732 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:39:59,735 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:39:59,735 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:39:59,736 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:39:59 PM          2                 74 B           624.55 GB              10 GB
2017-09-14 12:40:08,818 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:40:08,818 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:40:08,818 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:40:08,819 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:40:08,819 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:40:08,822 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:40:08,823 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:40:08,823 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:40:08,824 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:40:08,824 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:40:08,824 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:40:08,827 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:40:08,827 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:40:08,828 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:40:08 PM          3                 74 B           624.55 GB              10 GB
2017-09-14 12:40:17,929 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:40:17,930 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:40:17,930 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:40:17,930 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:40:17,931 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:40:17,934 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:40:17,934 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:40:17,935 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:40:17,935 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:40:17,935 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:40:17,936 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:40:17,939 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:40:17,939 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:40:17,939 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:40:17,940 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
Sep 14, 2017 12:40:18 PM          4                 74 B           624.55 GB              10 GB
2017-09-14 12:40:27,031 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-09-14 12:40:27,032 INFO  [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-09-14 12:40:27,037 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004
2017-09-14 12:40:27,037 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004
2017-09-14 12:40:27,037 INFO  [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004
2017-09-14 12:40:27,038 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK]
2017-09-14 12:40:27,038 INFO  [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: []
2017-09-14 12:40:27,038 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced.
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
2017-09-14 12:40:27,042 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
2017-09-14 12:40:27,043 INFO  [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
2017-09-14 12:40:27,043 INFO  [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration
No block has been moved for 5 iterations. Exiting...
Sep 14, 2017 12:40:27 PM          5                 74 B           624.55 GB              10 GB
Sep 14, 2017 12:40:27 PM Balancing took 48.137 seconds
Exit code: 253

I tried adjsuting the balancer settings by increasing and decreasing values in Cloudera UI with no avail.

 

Note that there are also 3 datanodes in total with replication factor of 3. Could this be preventing the balancer from finding a node to place the blocks withouht breaking the replication factor?

 

1 ACCEPTED SOLUTION

avatar
Contributor

As suspected there were no available datanodes to place replicas to as I had default replication factor of 3 and 3 datanodes in total. 

 

The balancer started working fine after adding a fourth datanode to the cluster.

View solution in original post

3 REPLIES 3

avatar
Contributor

Hi,

 

Try running mannualy (as HDFS user):

 

hdfs balancer -threshold 5

HDFS balancer skips tiny blocks, check if this is your case. --> JIRA HDFS-8824

 

Regards, 
Marc Casajús

avatar
Contributor

Still the same. I don't think that changing the threshold will have any effect.

avatar
Contributor

As suspected there were no available datanodes to place replicas to as I had default replication factor of 3 and 3 datanodes in total. 

 

The balancer started working fine after adding a fourth datanode to the cluster.