Created 05-05-2016 09:23 AM
I have a four node cluster (HDP 2.4). On the Ambari hosts page, I can see that the space consumption for one of the nodes is very high. First of all I do not understand the cause of this. I would like to understand how easy is it to evenly distribute the data across all the nodes so that all nodes consumes equal amount of dfs data.
Created on 05-05-2016 09:32 AM - edited 08-19-2019 02:12 AM
1. You can go to the data directory on that datanode and do $du -sh * to check how much size it has.
It might be the case you have non dfs data present on that node.
2. You can evenly distribute data across datanodes using Balancer as shown below.
Created on 05-05-2016 09:32 AM - edited 08-19-2019 02:12 AM
1. You can go to the data directory on that datanode and do $du -sh * to check how much size it has.
It might be the case you have non dfs data present on that node.
2. You can evenly distribute data across datanodes using Balancer as shown below.
Created 05-05-2016 09:50 AM
@Sagar ShimpiThanks for pointing me to "Rebalance HDFS" utility. After I clicked on Rebalance HDFS, the progress bar quickly ended saying success. Shouldn't this be a long procedure, with lots of data being sent from one node to another to balance?. How do I know when the process will finish, if it has not ended, because after click that link, I do not see any change immediately.
Created 05-05-2016 10:00 AM
Please check above value. It seems your HDFS data was too less and hence balancer took less time to completed. Please do let me know the ambari and hdp version you are using.
Created 05-05-2016 10:21 AM
@Sagar Shimpi I have checked the NameNode UI. I observe that the "Non-DFS Used" is showing 77.15 GB and "used" showing just 1.25 GB. 77.15 GB is very high as compared to other three nodes. My question is what to do next? how do I free up more space on this node?. As for the versions, HDP is version 2.4 and Ambari is version 2.2.1.1.
Created 05-05-2016 10:47 AM
I found out what was occupying space in Non dfs space. It was the log files under the folder /var/log/hive. It had around 67 GB of log file!!!. I removed the file and now the space has been reclaimed. Thanks for your help. (I used the command du -kscx * to know the size of each folder. I executed this command in the log folder.)
Created 05-06-2016 02:51 PM
There are a number of things that cause HDFS imbalance. This post explains some of those causes in more detail. The balancer should be run regularly in a production system (you can kick it off from the command line, so you can schedule it using cron, for example). The balancer can take a while to complete if there are a lot of blocks to move.
Note that, when HDFS moves a block, the old block gets "marked for deletion" but doesn't get deleted immediately. HDFS deals with these un-used blocks over time.