Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Solved Go to solution

Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Guru

Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

I have gone through below one but I am looking for any shell script.

https://github.com/nmilford/clean-hadoop-tmp/blob/master/clean-hadoop-tmp

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Explorer

You can do:

#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done

Replace the directories or files you need to clean up appropriately.

View solution in original post

13 REPLIES 13
Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Create a file /scripts/myLogCleaner.sh ( or whatever )

add the following command ( which deletes all files having a log in the name and are older than a day )

find /tmp/hive -name *log* -mtime +1 -exec rm {} \;

and crontab it.

crontab -e

0 0 * * * /scripts/myLogCleaner.sh

This will start the cleaner every day at midnight.

( obviously just one out of approximately 3 million different ways to do it :-) )

Edit: ah not the logs of the hive CLI but the scratch dir of hive. That makes it a bit harder since there is no hadoop find. Weird that it grows so big it should clean up after itself unless the command line interface or task gets killed.

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Mentor

this is on hdfs Benjamin. I mean same approach just hdfs commands vs local fs.

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Guru

@Benjamin Leonhardi : I can do it easily with local but I am looking for hdfs /tmp/hive dir.

So do we have anything like this for hdfs.

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

That would be the time when I start writing some python magic parsing the timestamp from the hadoop -ls output command. Or to be faster a small Java program doing the same with the FileSystem API.

Someone already did the first approach with shell script apparently. Replace the echo with a hadoop fs -rm -r -f and you might be good. But I didn't test it obviously ...

http://stackoverflow.com/questions/12613848/finding-directories-older-than-n-days-in-hdfs

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Mentor

@Benjamin Leonhardi yep, I've done that a while ago with java hdfs api. Look up the paths, identify age of files, delete.

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Expert Contributor
Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Rising Star

@Saurabh Kumar To add to this, you could investigate third party dev projects such as https://github.com/nmilford/clean-hadoop-tmp

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Explorer

You can do:

#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done

Replace the directories or files you need to clean up appropriately.

View solution in original post

Highlighted

Re: Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

Guru

@Gurmukh Singh: I tried this script and not getting anything just below output.

[user@server2~]$ ./cleanup.sh

Usage: dir_diff.sh [30]

I have same thing in script which you have mentioned.

Don't have an account?
Coming from Hortonworks? Activate your account here