I recently realized that more than half of all our HDFS usage is under /tmp
I wrote a script to go find all the data and it looks like the vast majority of it is under /tmp/hive/***, for example:
/tmp/hive/root
/tmp/hive/hdfs
/tmp/hive/my_user
These have tens of TB in each of them and quite a lot of it is very old.
Is it safe to delete this data? Say, anything older than 30 days? Would 14 days be safe?
Any best practices here?
It seems odd that there is nothing built-in to maintain this space...