Created on 03-26-2021 04:25 AM - edited 09-16-2022 07:41 AM
Hello,
Need some assistance / guidance on how we can reduce Non-HDFS Space. We see Non-HDFS Space of around 270 used, as we are facing space crunch, we would explore possibilities for getting non-hdfs space reduced.
I have cleared all Yarn logs for the applications which were killed/ failed etc (our /data mountpoint houses dfs, yarn, kudu, impala), yet this does not solve our issue. Any assistance / guidance is much appreciated.
Thanks
Amn
Created 04-01-2021 05:31 AM
Hello @Amn_468, As you explained the /data mount point is used for YARN, Kudu and Impala apart from DN storage volumes.
Here HDFS considers disk usage of /data/dfs/dn as HDFS/DFS used and rest all disk usage as NON-HDFS usage. If the "/data" mount point is used as YARN local directory (/data/yarn/nm), Kudu data/WAL directory (/data/kudu/*) or Impala Scratch directory (/data/impala/*) directory , then those directory usage will be considered as non-DFS Usage. In general YARN local directory or Impala Scratch directory gets empty after successful job run. In case there are files resides from a previous job run that was killed/aborted, then you need to remove those files manually to get the disk space recovered. However, Kudu space will remain intact/utilised as long as the mount point is used for Kudu Service.
You can calculate the disk usage by each service and then you can calculate how much data you can recover if the YARN local directory and Impala Scratch directory data gets deleted pr removed fully.
In case you are running on ext4 file system and low on available space, consider lowering the superuser block reservation from 5% to 1% (using the "tune2fs -m 1" option) on the fils system which will allow you to have some more free space on the mount point.
Created 04-01-2021 05:31 AM
Hello @Amn_468, As you explained the /data mount point is used for YARN, Kudu and Impala apart from DN storage volumes.
Here HDFS considers disk usage of /data/dfs/dn as HDFS/DFS used and rest all disk usage as NON-HDFS usage. If the "/data" mount point is used as YARN local directory (/data/yarn/nm), Kudu data/WAL directory (/data/kudu/*) or Impala Scratch directory (/data/impala/*) directory , then those directory usage will be considered as non-DFS Usage. In general YARN local directory or Impala Scratch directory gets empty after successful job run. In case there are files resides from a previous job run that was killed/aborted, then you need to remove those files manually to get the disk space recovered. However, Kudu space will remain intact/utilised as long as the mount point is used for Kudu Service.
You can calculate the disk usage by each service and then you can calculate how much data you can recover if the YARN local directory and Impala Scratch directory data gets deleted pr removed fully.
In case you are running on ext4 file system and low on available space, consider lowering the superuser block reservation from 5% to 1% (using the "tune2fs -m 1" option) on the fils system which will allow you to have some more free space on the mount point.
Created 04-01-2021 09:38 PM
@PabitraDas Thanks for your reply