Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Free up HDFS disk space

avatar
Explorer

My HDFS has total disk space of 28.2 TB, which I have 15.1TB useful data on it. After a while, Ambari reports the disk space is 75% full, so I started "Balance HDFS" from Ambari. Since then, the available disk space decrease slowly until they are all gone. Now I have no more useful disk space. How can I reclaim the unused disk space.

hdfs@msl-dpe-perf88:/$ hdfs dfs -du -h -s /
15.1 T  /
hdfs@msl-dpe-perf88:/$ hdfs dfs -df -h
Filesystem                            Size    Used  Available  Use%
hdfs://msl-dpe-perf88.msl.lab:8020  28.2 T  27.1 T          0   96%


4 REPLIES 4

avatar
Master Mentor

@Harry Li

When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the /trash directory. The file can be restored quickly as long as it remains in /trash. The retention time in the /trash is configurable. After the expiry of its life in /trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.

If you want to change the default setting then it needs to be updated in the core-site properties, which you can find in the Ambari menu. Simply follow this path; from the Ambari Dashboard, click HDFS -> Configs -> Advanced -> Advanced core-site. Then set the 'fs.trash.interval' to 0 to disable. This will require a restart of the related components to pick up the changes.

Check the HDFS structure to see where the most data is held. This will give you the space on each data node

$ hdfs dfsadmin -report 

Breakdown of the HDFS across the cluster and each of the data nodes run the below command, you should give it some time to complete.

$ hdfs dfs -expunge 

By default, HDFS uses trash. You can bypass this with rm -skipTrash or just delete the trash with The other option when cleaning up your data use the -skipTrash flag:

$ hdfs dfs -rm -R -skipTrash /folder-path

HTH

avatar
Explorer

Hi Geoffrey,

I had been using -skipTrash options when I deleting files and /user/hdfs/.Trash directory is empty. I had also used -expunge command 24 hours ago. I still did not see disk space being freed. Here is results from dfsadm command

hdfs@msl-dpe-perf88:/$ hdfs dfs -ls /user/hdfs/.Trash
hdfs@msl-dpe-perf88:/$ 

hdfs@msl-dpe-perf88:/$  hdfs dfsadmin -report 
Configured Capacity: 31048107810816 (28.24 TB)
Present Capacity: 29767722012672 (27.07 TB)
DFS Remaining: 0 (0 B)
DFS Used: 29767722012672 (27.07 TB)
DFS Used%: 100.00%
Under replicated blocks: 97449
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


-------------------------------------------------



avatar
Master Mentor

@Harry Li

How many data nodes do you have in your cluster?

Can you try to isolate the culprit

$ hdfs dfs -du -h / 

If you enabled snapshots then that could be one reason can you check its existence?

$ hdfs lsSnapshottableDir 

HTH

avatar
Explorer

Hi Geoffrey,

I have 4 data nodes and no snapshots set. Here are the output from the commands

hdfs@msl-dpe-perf88:/$ hdfs dfs -df -h
Filesystem                            Size    Used  Available  Use%
hdfs://msl-dpe-perf88.msl.lab:8020  28.2 T  27.1 T          0   96%

hdfs@msl-dpe-perf88:/$ hdfs lsSnapshottableDir 


hdfs@msl-dpe-perf88:/$