Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

HDFS is not rebalancing after adding new DataNode

avatar
New Contributor

Hi,

I have a problem with rebalancing HDFS after adding new DataNode to cluter. In my configuration I had 4 DataNodes and added new one (5th).

Below is report from dfsadmin

[hdfs@snr-prod-master0 ~]$ hdfs dfsadmin -report
Configured Capacity: 21563228579840 (19.61 TB)
Present Capacity: 20460562895805 (18.61 TB)
DFS Remaining: 20290148094909 (18.45 TB)
DFS Used: 170414800896 (158.71 GB)
DFS Used%: 0.83%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


-------------------------------------------------
Live datanodes (5):


Name: 172.17.2.61:50010 (snr-prod-slave1)
Hostname: snr-prod-slave1
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 35358969856 (32.93 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4056646234773 (3.69 TB)
DFS Used%: 0.82%
DFS Remaining%: 94.06%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 12
Last contact: Wed Dec 20 20:52:23 UTC 2017




Name: 172.17.2.64:50010 (snr-prod-slave4)
Hostname: snr-prod-slave4
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 47864344576 (44.58 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4044275077691 (3.68 TB)
DFS Used%: 1.11%
DFS Remaining%: 93.78%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 10
Last contact: Wed Dec 20 20:52:26 UTC 2017




Name: 172.17.2.62:50010 (snr-prod-slave2)
Hostname: snr-prod-slave2
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 221184 (216 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4092407638196 (3.72 TB)
DFS Used%: 0.00%
DFS Remaining%: 94.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 6
Last contact: Wed Dec 20 20:52:26 UTC 2017




Name: 172.17.2.65:50010 (snr-prod-slave5)
Hostname: snr-prod-slave5
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 44406976512 (41.36 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4047866664447 (3.68 TB)
DFS Used%: 1.03%
DFS Remaining%: 93.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 8
Last contact: Wed Dec 20 20:52:23 UTC 2017




Name: 172.17.2.60:50010 (snr-prod-slave0)
Hostname: snr-prod-slave0
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 42784288768 (39.85 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4048952479802 (3.68 TB)
DFS Used%: 0.99%
DFS Remaining%: 93.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 16
Last contact: Wed Dec 20 20:52:23 UTC 2017


45510-screen-shot-2017-12-20-at-222033.png

And after adding new node to cluster i have run rebalance operation, to distribute data equally, but it says it is balanced (The cluster is balanced. Exiting...)

[hdfs@snr-prod-master0 ~]$ hdfs balancer -threshold 5
17/12/20 20:57:36 INFO balancer.Balancer: Using a threshold of 5.0
17/12/20 20:57:36 INFO balancer.Balancer: namenodes  = [hdfs://snr-prod-master0:8020]
17/12/20 20:57:36 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/12/20 20:57:36 INFO balancer.Balancer: included nodes = []
17/12/20 20:57:36 INFO balancer.Balancer: excluded nodes = []
17/12/20 20:57:36 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/12/20 20:57:37 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/12/20 20:57:38 INFO block.BlockTokenSecretManager: Setting block keys
17/12/20 20:57:38 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
17/12/20 20:57:38 INFO block.BlockTokenSecretManager: Setting block keys
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.61:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.60:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.64:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.62:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.65:50010
17/12/20 20:57:38 INFO balancer.Balancer: 0 over-utilized: []
17/12/20 20:57:38 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Dec 20, 2017 8:57:38 PM           0                  0 B                 0 B                0 B
Dec 20, 2017 8:57:38 PM  Balancing took 1.714 seconds

What am i missing?

Thanks for reply!

Robert

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @Robert Jonczy,

The report you got is accurate, as I would like to stress on the parameter you have used "threshold"

-threshold 		<threshold>Percentage of disk capacity.

this is the value that balancer considered to have + or - of the percent of the "average DFS usage" to be moved

which is : % of DFS Used / total capacity

In your scenario it is < almost 1%, the threshold you specified (5%) which only works if there is a difference of 10%( +/- 5%) [not in your case ], hence it is not balancing anymore the data.

Hope this clarifies !!

View solution in original post

3 REPLIES 3

avatar

Today the HDFS balancer doesn't balance disks within a DataNode

This is a pretty known and talked about issue with HDFS balancer.

See apache jira - https://issues.apache.org/jira/browse/HDFS-1312

The apache documentation is - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html

This tracks and resolve this issue.

avatar
Super Collaborator

Hi @Robert Jonczy,

The report you got is accurate, as I would like to stress on the parameter you have used "threshold"

-threshold 		<threshold>Percentage of disk capacity.

this is the value that balancer considered to have + or - of the percent of the "average DFS usage" to be moved

which is : % of DFS Used / total capacity

In your scenario it is < almost 1%, the threshold you specified (5%) which only works if there is a difference of 10%( +/- 5%) [not in your case ], hence it is not balancing anymore the data.

Hope this clarifies !!

avatar
New Contributor

@bkosaraju. Your explanations makes sense. Thanks for clarifying! My understanding about threshold was different.

Robert

Labels