Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why HDFS does not free physical space after emptying trash

avatar
New Contributor

I started using HDP sandbox 2.5 - and currently I'm puzzled with, which looks like this:

0. I use very newly set up distribution, with everything set to defaults, no properties changed etc...

1. I put some large files to hdfs

2. I remove them (without sending them to trash)

3. I see that total directory sizes are as before step 1, but available free space is reduced

4. Eventually this leads to hdfs becoming full and no apparent way to clear it 😞

5. This is not reproduced on my standalone install of hadoop / hdfs on ubuntu - here the remaining space returns to normal after deletion.

This sequence of steps looks like this in a command line:

# hdfs dfs -df 
   ... shows 7% usage
# hdfs dfs -put large-data-directory /large-data-directory
# hdfs dfs -df 
   ... shows 95% usage
# hdfs dfs -rm -r -skipTrash /large-data-directory
# hdfs dfs -du /user/root
   ... to make sure nothing sticks in /user/root/.Trash
# hdfs dfs -df 
   ... still shows 95% usage

So could anyone please enlighten me on how this can be fixed? I haven't found any properties yet which may cause such behavior...

UPD

> then after how much time did you run the "hdfs dfs -du /user/root" command? (immediately or a few seconds/minutes later)

about 3 minutes later (and the result doesn't change after about a day)

> Is it still consuming the same 95% usages (even after a long time?)

yes, 24 hours don't change anything 😞

> fs.trash.interval

It was 360 when testing. Later I changed it to 0 but it does not seem to help.

> fs.trash.checkpoint.interval

This is not set in configs and I did not add it, so I believe it should be at default value?

UPD2 dfsadmin -report could be seen at this github gist: https://gist.github.com/anonymous/4e6c81c920700251aad1d33748afb29d

2 REPLIES 2

avatar
Master Mentor

@Rodion Gork

- When you deleted the large data directory from HDFS then after how much time did you run the "hdfs dfs -du /user/root" command? (immediately or a few seconds/minutes later)

- Also what does the following command show?

# su - hdfs -c "hdfs dfsadmin -report"

- Is it still consuming the same 95% usages (even after a long time?)

- Although you are using "skipTrash", By any change have you altered any of the following parameter value:

---> Deletion interval specifies how long (in minutes) a checkpoint will be expired before it is deleted. It is the value of fs.trash.interval. The NameNode runs a thread to periodically remove expired checkpoints from the file system.

---> Emptier interval specifies how long (in minutes) the NameNode waits before running a thread to manage checkpoints. The NameNode deletes checkpoints that are older than fs.trash.interval and creates a new checkpoint from /user/${username}/.Trash/Current. This frequency is determined by the value of fs.trash.checkpoint.interval, and it must not be greater than the deletion interval. This ensures that in an emptier window, there are one or more checkpoints in the trash.

avatar
New Contributor

Thanks for response! I've updated my post with answers to your questions!