Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reduce Non-HDFS Space

avatar
Rising Star

Hello,
Need some assistance / guidance on how we can reduce Non-HDFS Space. We see Non-HDFS Space of around 270 used, as we are facing space crunch, we would explore possibilities for getting non-hdfs space reduced.
I have cleared all Yarn logs for the applications which were killed/ failed etc (our /data mountpoint houses dfs, yarn, kudu, impala), yet this does not solve our issue. Any assistance / guidance is much appreciated.

 

Thanks 

Amn

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello @Amn_468, As you explained the /data mount point is used for YARN, Kudu and Impala apart from DN storage volumes. 

 

Here HDFS considers disk usage of /data/dfs/dn as HDFS/DFS used and rest all disk usage as NON-HDFS usage. If the "/data"  mount point is used as YARN local directory (/data/yarn/nm), Kudu data/WAL directory (/data/kudu/*) or Impala Scratch directory (/data/impala/*) directory , then those directory usage will be considered as non-DFS Usage. In general YARN local directory or Impala Scratch directory gets empty after successful job run. In case there are files resides from a previous job run that was killed/aborted, then you need to remove those files manually to get the disk space recovered. However, Kudu space will remain intact/utilised as long as the mount point is used for Kudu Service. 

 

You can calculate the disk usage by each service and then you can calculate how much data you can recover if the YARN local directory and Impala Scratch directory data gets deleted pr removed fully. 

 

In case you are running on ext4 file system and low on available space, consider lowering the superuser block reservation from 5% to 1% (using the "tune2fs -m 1" option) on the fils system which will allow you to have some more free space on the mount point.

 

 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hello @Amn_468, As you explained the /data mount point is used for YARN, Kudu and Impala apart from DN storage volumes. 

 

Here HDFS considers disk usage of /data/dfs/dn as HDFS/DFS used and rest all disk usage as NON-HDFS usage. If the "/data"  mount point is used as YARN local directory (/data/yarn/nm), Kudu data/WAL directory (/data/kudu/*) or Impala Scratch directory (/data/impala/*) directory , then those directory usage will be considered as non-DFS Usage. In general YARN local directory or Impala Scratch directory gets empty after successful job run. In case there are files resides from a previous job run that was killed/aborted, then you need to remove those files manually to get the disk space recovered. However, Kudu space will remain intact/utilised as long as the mount point is used for Kudu Service. 

 

You can calculate the disk usage by each service and then you can calculate how much data you can recover if the YARN local directory and Impala Scratch directory data gets deleted pr removed fully. 

 

In case you are running on ext4 file system and low on available space, consider lowering the superuser block reservation from 5% to 1% (using the "tune2fs -m 1" option) on the fils system which will allow you to have some more free space on the mount point.

 

 

avatar
Rising Star

@PabitraDas  Thanks for your reply