Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Safe to delete under /tmp in HDFS (how about /tmp/hive/****)?

avatar
Super Collaborator

I recently realized that more than half of all our HDFS usage is under /tmp

I wrote a script to go find all the data and it looks like the vast majority of it is under /tmp/hive/***, for example:

/tmp/hive/root

/tmp/hive/hdfs

/tmp/hive/my_user

These have tens of TB in each of them and quite a lot of it is very old.

Is it safe to delete this data? Say, anything older than 30 days? Would 14 days be safe?

Any best practices here?

It seems odd that there is nothing built-in to maintain this space...

1 ACCEPTED SOLUTION

avatar
Super Guru
@Zack Riesland

Yes, it is safe to remove these folders and do a clean up. There are already actually cleanup scripts for this. Basically when a client runs a query with HiveServer2, Hive first creates these temporary folders to store intermediate/temporary data. For most queries, this is cleaned up at the end of query but sometimes due to issues with the query, these files are still hanging and you have to do a manual cleanup. Check this link for more details.

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-...

Following link might also give you some ideas on how to cleanup.

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...

View solution in original post

1 REPLY 1

avatar
Super Guru
@Zack Riesland

Yes, it is safe to remove these folders and do a clean up. There are already actually cleanup scripts for this. Basically when a client runs a query with HiveServer2, Hive first creates these temporary folders to store intermediate/temporary data. For most queries, this is cleaned up at the end of query but sometimes due to issues with the query, these files are still hanging and you have to do a manual cleanup. Check this link for more details.

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-...

Following link might also give you some ideas on how to cleanup.

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...