Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Safe to delete under /tmp in HDFS (how about /tmp/hive/****)?

Solved Go to solution
Highlighted

Safe to delete under /tmp in HDFS (how about /tmp/hive/****)?

Super Collaborator

I recently realized that more than half of all our HDFS usage is under /tmp

I wrote a script to go find all the data and it looks like the vast majority of it is under /tmp/hive/***, for example:

/tmp/hive/root

/tmp/hive/hdfs

/tmp/hive/my_user

These have tens of TB in each of them and quite a lot of it is very old.

Is it safe to delete this data? Say, anything older than 30 days? Would 14 days be safe?

Any best practices here?

It seems odd that there is nothing built-in to maintain this space...

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Safe to delete under /tmp in HDFS (how about /tmp/hive/****)?

Super Guru
@Zack Riesland

Yes, it is safe to remove these folders and do a clean up. There are already actually cleanup scripts for this. Basically when a client runs a query with HiveServer2, Hive first creates these temporary folders to store intermediate/temporary data. For most queries, this is cleaned up at the end of query but sometimes due to issues with the query, these files are still hanging and you have to do a manual cleanup. Check this link for more details.

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-...

Following link might also give you some ideas on how to cleanup.

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...

1 REPLY 1

Re: Safe to delete under /tmp in HDFS (how about /tmp/hive/****)?

Super Guru
@Zack Riesland

Yes, it is safe to remove these folders and do a clean up. There are already actually cleanup scripts for this. Basically when a client runs a query with HiveServer2, Hive first creates these temporary folders to store intermediate/temporary data. For most queries, this is cleaned up at the end of query but sometimes due to issues with the query, these files are still hanging and you have to do a manual cleanup. Check this link for more details.

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-...

Following link might also give you some ideas on how to cleanup.

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...