- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎02-24-2016 01:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.
I have gone through below one but I am looking for any shell script.
https://github.com/nmilford/clean-hadoop-tmp/blob/master/clean-hadoop-tmp
Created ‎08-30-2016 11:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can do:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done
Replace the directories or files you need to clean up appropriately.
Created ‎02-24-2016 01:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Create a file /scripts/myLogCleaner.sh ( or whatever )
add the following command ( which deletes all files having a log in the name and are older than a day )
find /tmp/hive -name *log* -mtime +1 -exec rm {} \;
and crontab it.
crontab -e
0 0 * * * /scripts/myLogCleaner.sh
This will start the cleaner every day at midnight.
( obviously just one out of approximately 3 million different ways to do it 🙂 )
Edit: ah not the logs of the hive CLI but the scratch dir of hive. That makes it a bit harder since there is no hadoop find. Weird that it grows so big it should clean up after itself unless the command line interface or task gets killed.
Created ‎02-24-2016 01:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
this is on hdfs Benjamin. I mean same approach just hdfs commands vs local fs.
Created ‎02-24-2016 01:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Benjamin Leonhardi : I can do it easily with local but I am looking for hdfs /tmp/hive dir.
So do we have anything like this for hdfs.
Created ‎02-24-2016 01:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That would be the time when I start writing some python magic parsing the timestamp from the hadoop -ls output command. Or to be faster a small Java program doing the same with the FileSystem API.
Someone already did the first approach with shell script apparently. Replace the echo with a hadoop fs -rm -r -f and you might be good. But I didn't test it obviously ...
http://stackoverflow.com/questions/12613848/finding-directories-older-than-n-days-in-hdfs
Created ‎02-24-2016 01:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Benjamin Leonhardi yep, I've done that a while ago with java hdfs api. Look up the paths, identify age of files, delete.
Created ‎02-24-2016 01:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎04-08-2016 03:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Saurabh Kumar To add to this, you could investigate third party dev projects such as https://github.com/nmilford/clean-hadoop-tmp
Created ‎08-30-2016 11:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can do:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done
Replace the directories or files you need to clean up appropriately.
Created ‎09-21-2016 10:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gurmukh Singh: I tried this script and not getting anything just below output.
[user@server2~]$ ./cleanup.sh
Usage: dir_diff.sh [30]
I have same thing in script which you have mentioned.
