Support Questions

robert_jonczy · ‎12-20-2017

Hi,

I have a problem with rebalancing HDFS after adding new DataNode to cluter. In my configuration I had 4 DataNodes and added new one (5th).

Below is report from dfsadmin

[hdfs@snr-prod-master0 ~]$ hdfs dfsadmin -report
Configured Capacity: 21563228579840 (19.61 TB)
Present Capacity: 20460562895805 (18.61 TB)
DFS Remaining: 20290148094909 (18.45 TB)
DFS Used: 170414800896 (158.71 GB)
DFS Used%: 0.83%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


-------------------------------------------------
Live datanodes (5):


Name: 172.17.2.61:50010 (snr-prod-slave1)
Hostname: snr-prod-slave1
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 35358969856 (32.93 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4056646234773 (3.69 TB)
DFS Used%: 0.82%
DFS Remaining%: 94.06%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 12
Last contact: Wed Dec 20 20:52:23 UTC 2017




Name: 172.17.2.64:50010 (snr-prod-slave4)
Hostname: snr-prod-slave4
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 47864344576 (44.58 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4044275077691 (3.68 TB)
DFS Used%: 1.11%
DFS Remaining%: 93.78%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 10
Last contact: Wed Dec 20 20:52:26 UTC 2017




Name: 172.17.2.62:50010 (snr-prod-slave2)
Hostname: snr-prod-slave2
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 221184 (216 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4092407638196 (3.72 TB)
DFS Used%: 0.00%
DFS Remaining%: 94.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 6
Last contact: Wed Dec 20 20:52:26 UTC 2017




Name: 172.17.2.65:50010 (snr-prod-slave5)
Hostname: snr-prod-slave5
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 44406976512 (41.36 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4047866664447 (3.68 TB)
DFS Used%: 1.03%
DFS Remaining%: 93.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 8
Last contact: Wed Dec 20 20:52:23 UTC 2017




Name: 172.17.2.60:50010 (snr-prod-slave0)
Hostname: snr-prod-slave0
Decommission Status : Normal
Configured Capacity: 4312645715968 (3.92 TB)
DFS Used: 42784288768 (39.85 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 4048952479802 (3.68 TB)
DFS Used%: 0.99%
DFS Remaining%: 93.89%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 16
Last contact: Wed Dec 20 20:52:23 UTC 2017

And after adding new node to cluster i have run rebalance operation, to distribute data equally, but it says it is balanced (The cluster is balanced. Exiting...)

[hdfs@snr-prod-master0 ~]$ hdfs balancer -threshold 5
17/12/20 20:57:36 INFO balancer.Balancer: Using a threshold of 5.0
17/12/20 20:57:36 INFO balancer.Balancer: namenodes  = [hdfs://snr-prod-master0:8020]
17/12/20 20:57:36 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/12/20 20:57:36 INFO balancer.Balancer: included nodes = []
17/12/20 20:57:36 INFO balancer.Balancer: excluded nodes = []
17/12/20 20:57:36 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/12/20 20:57:37 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/12/20 20:57:38 INFO block.BlockTokenSecretManager: Setting block keys
17/12/20 20:57:38 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
17/12/20 20:57:38 INFO block.BlockTokenSecretManager: Setting block keys
17/12/20 20:57:38 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
17/12/20 20:57:38 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.61:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.60:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.64:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.62:50010
17/12/20 20:57:38 INFO net.NetworkTopology: Adding a new node: /default-rack/172.17.2.65:50010
17/12/20 20:57:38 INFO balancer.Balancer: 0 over-utilized: []
17/12/20 20:57:38 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Dec 20, 2017 8:57:38 PM           0                  0 B                 0 B                0 B
Dec 20, 2017 8:57:38 PM  Balancing took 1.714 seconds

What am i missing?

Thanks for reply!

Robert

bkosaraju · ‎12-20-2017

Hi @Robert Jonczy,

The report you got is accurate, as I would like to stress on the parameter you have used "threshold"

-threshold 		<threshold>Percentage of disk capacity.

this is the value that balancer considered to have + or - of the percent of the "average DFS usage" to be moved

which is : % of DFS Used / total capacity

In your scenario it is < almost 1%, the threshold you specified (5%) which only works if there is a difference of 10%( +/- 5%) [not in your case ], hence it is not balancing anymore the data.

Hope this clarifies !!

View solution in original post

namaheshwari · ‎12-20-2017

Today the HDFS balancer doesn't balance disks within a DataNode

This is a pretty known and talked about issue with HDFS balancer.

See apache jira - https://issues.apache.org/jira/browse/HDFS-1312

The apache documentation is - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html

This tracks and resolve this issue.

bkosaraju · ‎12-20-2017

Hi @Robert Jonczy,

The report you got is accurate, as I would like to stress on the parameter you have used "threshold"

-threshold 		<threshold>Percentage of disk capacity.

this is the value that balancer considered to have + or - of the percent of the "average DFS usage" to be moved

which is : % of DFS Used / total capacity

In your scenario it is < almost 1%, the threshold you specified (5%) which only works if there is a difference of 10%( +/- 5%) [not in your case ], hence it is not balancing anymore the data.

Hope this clarifies !!

robert_jonczy · ‎12-21-2017

@bkosaraju. Your explanations makes sense. Thanks for clarifying! My understanding about threshold was different.

Robert

Cloudera Community

Support Questions

HDFS is not rebalancing after adding new DataNode

Tips to Improve HDFS Rebalancer

Can HDFS Rebalancer run without interrupted Produc...

How to Move or Change HDFS DataNode Directories

HDFS Recovery Time from Single DataNode Failure

hdfs rebalancing

Datanode Service Error Related to NFS Mount Issue

HDFS Balancer: Balancing Data Between Disks on a D...

Datanode added but not seen by namenode

Quickly Adding HDF to HDP 2.5 Sandbox

What are the best practices for HDFS rebalancing?