In my HDP cluster the Data Directory on one particular Data Node is almost Full (98%), while other data node data directories are below 60%. How can I know why HDFS is writing data into that one particular Data Node. I am worried if this might effect the cluster performance and would like to know how can I distribute the data to different data nodes. Can I use the rebalance hdfs in HDFS > Service Actions. This is also giving node manager unhealthy alert has the threshold limit is set to 90%. If I need to clean up the disk what kind of data do I need to consider. Kindly respond for the issue.
Thanks in advance.
HDFS data might not always be distributed uniformly across DataNodes. If the DataNodes are not balancing the data properly then you can run the HDFS Balancer from ambari UI.
Ambari UI --> HDFS --> "Service Actions" (Drop Down) --> Rebalance HDFS
As you mentioned that you are also getting "node manager unhealthy alert has the threshold limit is set to 90%"
Do you mean the "NodeManager is unhealthy" alert due to local-dirs (or local-dirs are bad errors)?
If yes, then it may be because the "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" property of YARN config is by default set to 90%.
If the utilization in the yarn disk (in this case /data) is above the limit set by yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage, try these options:
1. Free up some disk space
2. Try to increase the value for "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" through Ambari. Followed by NodeManager restart.
@jsensharmaThanks for the quick response. I have learned that it is not a recommended option to increase the percentage for "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage". And more over the disk space has already reached 98%. I think I need to free some disk space. But was just curious what kind of data can I be able to clear in the data directory. Any suggestions.
2. Can there be a scenario that a Rouge yarn application is filling up the data in that particular data node. If so how can I check that.
Thanks in Advance.
There are some recommendations from the HDFS Balancer perspective to make sure it runs fast with max performance. Like some of the parameters described in the link as : "dfs.datanode.balance.max.concurrent.moves", "dfs.balancer.max-size-to-move", "dfs.balancer.moverThreads" and "dfs.datanode.balance.max.bandwidthPerSec"
Regarding the YARN "local-dirs" heavy usage, Please refer to the following article which might give a better idea. You can also refer to the following yarn-site properties to get it better tuned.
The "yarn.nodemanager.local-dirs" is the property that points to the location where the intermediate data (temporary data) is written on the nodes where the NodeManager runs. The NodeManager service runs on all worker nodes. Please check if this dir has enough space.
The "yarn.nodemanager.localizer.cache.target-size-mb" property defines decides the maximum disk space to be used for localizing resources. Once the total disk size of the cache exceeds the value defined in this property the deletion service will try to remove files which are not used by any running containers.
The "yarn.nodemanager.localizer.cache.cleanup.interval-ms": defines this interval for the delete the unused resources if total cache size exceeds the configured max-size. Unused resources are those resources which are not referenced by any running container.