Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cleaning /dfs/dn sub-directories to free disk spaces

Solved Go to solution
Highlighted

Cleaning /dfs/dn sub-directories to free disk spaces

Rising Star

Hello everyone,

 

I am running into an issue, the dfs/dn is consuming all the disk space in a distributed cluster. My cluster has four nodes with each of 100HDD disk space. Apparently on Cloudera manager the consumed space is 200 GBs but when i check on HDFS only 50 gbs is consumed. Can any one help me cleaning up the directories ? or if cleaning is not an option how do you compress it without the need to scale up ?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Cleaning /dfs/dn sub-directories to free disk spaces

Master Guru
This may be a very basic question but I ask because it is unclear from the data you've posted: Have you accounted for replication? 50 GiB of HDFS file lengths summed up (hdfs dfs -du values) with 3x replication would be ~150 GiB of actual used space on the physical storage.

The /dfs/dn is where the file block replicas are stored. Nothing unnecessary is retained in HDFS, however a common overlooked item is older snapshots retaining data blocks that are no longer necessary. Deleting such snapshots frees up the occupied space based on HDFS files deleted after the snapshot was made.

If you're unable to grow your cluster, but need to store more data, then you may sacrifice availability of data by lowering your default replication to 2x or 1x (via dfs.replication config for new data writes, and hdfs dfs -setrep n for existing data).
1 REPLY 1

Re: Cleaning /dfs/dn sub-directories to free disk spaces

Master Guru
This may be a very basic question but I ask because it is unclear from the data you've posted: Have you accounted for replication? 50 GiB of HDFS file lengths summed up (hdfs dfs -du values) with 3x replication would be ~150 GiB of actual used space on the physical storage.

The /dfs/dn is where the file block replicas are stored. Nothing unnecessary is retained in HDFS, however a common overlooked item is older snapshots retaining data blocks that are no longer necessary. Deleting such snapshots frees up the occupied space based on HDFS files deleted after the snapshot was made.

If you're unable to grow your cluster, but need to store more data, then you may sacrifice availability of data by lowering your default replication to 2x or 1x (via dfs.replication config for new data writes, and hdfs dfs -setrep n for existing data).
Don't have an account?
Coming from Hortonworks? Activate your account here