Created on 09-14-2017 03:34 AM - edited 09-14-2017 04:39 AM
Hi,
the dfs directories on the data disks on our cluster got unevenly distribured, which I confirmed with hdfs dfsadmin -report. One datanode has DFS Used%: 60.20% while the rest has DFS Used%: 36.32%. All datanodes are in the same default rack. We use 5.10.1-1.cdh5.10.1.p0.10 with kerberized cluster.
However when I run the rebalancer, both from the Cloudera UI and from command line it starts normaly but fails within seconds to few minutes with the following error:
Thu Sep 14 12:39:37 CEST 2017 Current working directory: /run/cloudera-scm-agent/process/5092-hdfs-BALANCER Launching one-off process: /usr/lib64/cmf/service/hdfs/hdfs.sh balancer -threshold 10.0 -policy DataNode Thu Sep 14 12:39:37 CEST 2017 JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera using /usr/java/jdk1.7.0_67-cloudera as JAVA_HOME using 5 as CDH_VERSION using /run/cloudera-scm-agent/process/5092-hdfs-BALANCER as CONF_DIR using as SECURE_USER using as SECURE_GROUP CONF_DIR=/run/cloudera-scm-agent/process/5092-hdfs-BALANCER CMF_CONF_DIR=/etc/cloudera-scm-agent unlimited /bin/kinit using hdfs/hadoop-master01.example.net@EXAMPLE.NET as Kerberos principal using /run/cloudera-scm-agent/process/5092-hdfs-BALANCER/krb5cc_994 as Kerberos ticket cache 2017-09-14 12:39:39,707 INFO [main] balancer.Balancer (Balancer.java:parse(829)) - Using a threshold of 10.0 2017-09-14 12:39:39,710 INFO [main] balancer.Balancer (Balancer.java:run(644)) - namenodes = [hdfs://nameservice1] 2017-09-14 12:39:39,712 INFO [main] balancer.Balancer (Balancer.java:run(645)) - parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false] 2017-09-14 12:39:39,712 INFO [main] balancer.Balancer (Balancer.java:run(646)) - included nodes = [] 2017-09-14 12:39:39,713 INFO [main] balancer.Balancer (Balancer.java:run(647)) - excluded nodes = [] 2017-09-14 12:39:39,713 INFO [main] balancer.Balancer (Balancer.java:run(648)) - source nodes = [] 2017-09-14 12:39:39,713 INFO [main] balancer.Balancer (Balancer.java:checkKeytabAndInit(694)) - Keytab is configured, will login using keytab. 2017-09-14 12:39:39,906 INFO [main] security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(1138)) - Login successful for user hdfs/hadoop-master01.example.net@EXAMPLE.NET using keytab file hdfs.keytab Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 2017-09-14 12:39:41,078 INFO [main] balancer.KeyManager (KeyManager.java:<init>(68)) - Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 2017-09-14 12:39:41,084 INFO [main] block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys 2017-09-14 12:39:41,086 INFO [main] balancer.KeyManager (KeyManager.java:<init>(142)) - Update block keys every 2hrs, 30mins, 0sec 2017-09-14 12:39:41,334 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000) 2017-09-14 12:39:41,334 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000) 2017-09-14 12:39:41,335 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200) 2017-09-14 12:39:41,335 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50) 2017-09-14 12:39:41,336 INFO [org.apache.hadoop.hdfs.server.balancer.KeyManager$BlockKeyUpdater@6e6b28b4] block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys 2017-09-14 12:39:41,344 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 2017-09-14 12:39:41,365 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004 2017-09-14 12:39:41,365 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004 2017-09-14 12:39:41,365 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004 2017-09-14 12:39:41,367 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK] 2017-09-14 12:39:41,367 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: [] 2017-09-14 12:39:41,369 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced. 2017-09-14 12:39:41,387 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 2017-09-14 12:39:41,387 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 2017-09-14 12:39:41,388 INFO [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK 2017-09-14 12:39:41,388 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 2017-09-14 12:39:41,388 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 2017-09-14 12:39:41,388 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 2017-09-14 12:39:41,388 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 2017-09-14 12:39:41,389 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration 2017-09-14 12:39:41,554 INFO [pool-4-thread-1] balancer.Dispatcher (Dispatcher.java:dispatch(289)) - Start moving blk_1074640031_900008 with size=74 from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK through 10.10.10.212:1004 2017-09-14 12:39:41,569 INFO [pool-4-thread-1] balancer.Dispatcher (Dispatcher.java:dispatch(325)) - Successfully moved blk_1074640031_900008 with size=74 from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK through 10.10.10.212:1004 Sep 14, 2017 12:39:41 PM 0 74 B 624.55 GB 10 GB 2017-09-14 12:39:50,590 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000) 2017-09-14 12:39:50,590 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000) 2017-09-14 12:39:50,590 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200) 2017-09-14 12:39:50,590 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50) 2017-09-14 12:39:50,592 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 2017-09-14 12:39:50,596 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004 2017-09-14 12:39:50,596 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004 2017-09-14 12:39:50,597 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004 2017-09-14 12:39:50,597 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK] 2017-09-14 12:39:50,597 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: [] 2017-09-14 12:39:50,598 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced. 2017-09-14 12:39:50,601 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 2017-09-14 12:39:50,601 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 2017-09-14 12:39:50,601 INFO [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK 2017-09-14 12:39:50,601 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 2017-09-14 12:39:50,601 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 2017-09-14 12:39:50,601 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 2017-09-14 12:39:50,602 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 2017-09-14 12:39:50,602 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration Sep 14, 2017 12:39:50 PM 1 74 B 624.55 GB 10 GB 2017-09-14 12:39:59,725 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000) 2017-09-14 12:39:59,725 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000) 2017-09-14 12:39:59,726 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200) 2017-09-14 12:39:59,726 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50) 2017-09-14 12:39:59,726 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 2017-09-14 12:39:59,730 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004 2017-09-14 12:39:59,730 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004 2017-09-14 12:39:59,731 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004 2017-09-14 12:39:59,731 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK] 2017-09-14 12:39:59,731 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: [] 2017-09-14 12:39:59,732 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced. 2017-09-14 12:39:59,735 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 2017-09-14 12:39:59,735 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 2017-09-14 12:39:59,736 INFO [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK 2017-09-14 12:39:59,736 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 2017-09-14 12:39:59,736 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 2017-09-14 12:39:59,736 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 2017-09-14 12:39:59,736 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 2017-09-14 12:39:59,736 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration Sep 14, 2017 12:39:59 PM 2 74 B 624.55 GB 10 GB 2017-09-14 12:40:08,818 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000) 2017-09-14 12:40:08,818 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000) 2017-09-14 12:40:08,818 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200) 2017-09-14 12:40:08,819 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50) 2017-09-14 12:40:08,819 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 2017-09-14 12:40:08,822 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004 2017-09-14 12:40:08,823 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004 2017-09-14 12:40:08,823 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004 2017-09-14 12:40:08,824 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK] 2017-09-14 12:40:08,824 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: [] 2017-09-14 12:40:08,824 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced. 2017-09-14 12:40:08,827 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 2017-09-14 12:40:08,827 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 2017-09-14 12:40:08,828 INFO [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK 2017-09-14 12:40:08,828 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 2017-09-14 12:40:08,828 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 2017-09-14 12:40:08,828 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 2017-09-14 12:40:08,828 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 2017-09-14 12:40:08,828 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration Sep 14, 2017 12:40:08 PM 3 74 B 624.55 GB 10 GB 2017-09-14 12:40:17,929 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000) 2017-09-14 12:40:17,930 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000) 2017-09-14 12:40:17,930 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200) 2017-09-14 12:40:17,930 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50) 2017-09-14 12:40:17,931 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 2017-09-14 12:40:17,934 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004 2017-09-14 12:40:17,934 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004 2017-09-14 12:40:17,935 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004 2017-09-14 12:40:17,935 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK] 2017-09-14 12:40:17,935 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: [] 2017-09-14 12:40:17,936 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced. 2017-09-14 12:40:17,939 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 2017-09-14 12:40:17,939 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 2017-09-14 12:40:17,939 INFO [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.213:1004:DISK 2017-09-14 12:40:17,940 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 2017-09-14 12:40:17,940 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 2017-09-14 12:40:17,940 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 2017-09-14 12:40:17,940 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 2017-09-14 12:40:17,940 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration Sep 14, 2017 12:40:18 PM 4 74 B 624.55 GB 10 GB 2017-09-14 12:40:27,031 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.movedWinWidth = 5400000 (default=5400000) 2017-09-14 12:40:27,032 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.moverThreads = 1000 (default=1000) 2017-09-14 12:40:27,032 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.balancer.dispatcherThreads = 200 (default=200) 2017-09-14 12:40:27,032 INFO [main] balancer.Balancer (Balancer.java:getInt(236)) - dfs.datanode.balance.max.concurrent.moves = 50 (default=50) 2017-09-14 12:40:27,032 INFO [main] balancer.Balancer (Balancer.java:getLong(227)) - dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 2017-09-14 12:40:27,037 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.214:1004 2017-09-14 12:40:27,037 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.213:1004 2017-09-14 12:40:27,037 INFO [main] net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default/10.10.10.212:1004 2017-09-14 12:40:27,038 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 1 over-utilized: [10.10.10.212:1004:DISK] 2017-09-14 12:40:27,038 INFO [main] balancer.Balancer (Balancer.java:logUtilizationCollection(405)) - 0 underutilized: [] 2017-09-14 12:40:27,038 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(578)) - Need to move 624.55 GB to make the cluster balanced. 2017-09-14 12:40:27,042 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 2017-09-14 12:40:27,042 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 2017-09-14 12:40:27,042 INFO [main] balancer.Balancer (Balancer.java:matchSourceWithTargetToMove(500)) - Decided to move 10 GB bytes from 10.10.10.212:1004:DISK to 10.10.10.214:1004:DISK 2017-09-14 12:40:27,042 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 2017-09-14 12:40:27,042 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(434)) - chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 2017-09-14 12:40:27,042 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(442)) - chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 2017-09-14 12:40:27,043 INFO [main] balancer.Balancer (Balancer.java:chooseStorageGroups(450)) - chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 2017-09-14 12:40:27,043 INFO [main] balancer.Balancer (Balancer.java:runOneIteration(602)) - Will move 10 GB in this iteration No block has been moved for 5 iterations. Exiting... Sep 14, 2017 12:40:27 PM 5 74 B 624.55 GB 10 GB Sep 14, 2017 12:40:27 PM Balancing took 48.137 seconds Exit code: 253
I tried adjsuting the balancer settings by increasing and decreasing values in Cloudera UI with no avail.
Note that there are also 3 datanodes in total with replication factor of 3. Could this be preventing the balancer from finding a node to place the blocks withouht breaking the replication factor?
Created 10-04-2017 07:48 AM
As suspected there were no available datanodes to place replicas to as I had default replication factor of 3 and 3 datanodes in total.
The balancer started working fine after adding a fourth datanode to the cluster.
Created 09-20-2017 05:41 AM
Hi,
Try running mannualy (as HDFS user):
hdfs balancer -threshold 5
HDFS balancer skips tiny blocks, check if this is your case. --> JIRA HDFS-8824
Regards,
Marc Casajús
Created 09-20-2017 12:37 PM
Still the same. I don't think that changing the threshold will have any effect.
Created 10-04-2017 07:48 AM
As suspected there were no available datanodes to place replicas to as I had default replication factor of 3 and 3 datanodes in total.
The balancer started working fine after adding a fourth datanode to the cluster.