Support Questions

Find answers, ask questions, and share your expertise

Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

avatar
Guru

Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

I have gone through below one but I am looking for any shell script.

https://github.com/nmilford/clean-hadoop-tmp/blob/master/clean-hadoop-tmp

1 ACCEPTED SOLUTION

avatar
Contributor

You can do:

#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done

Replace the directories or files you need to clean up appropriately.

View solution in original post

13 REPLIES 13

avatar
Guru

@Gurmukh Singh: Thanks I just tested it with following ways and it is working fine. We can change hadoop fs -ls to hadoop fs -rm -r and required dir.

#!/bin/bash

usage="Usage: dir_diff.sh [days]"

if [!"$1"]

then

echo$usage

exit1

fi

now=$(date +%s)

hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do

dir_date=`echo $f | awk '{print $6}'`

difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

if [$difference-gt$1]; then

hadoop fs -ls `echo$f| awk '{ print $8 }'`;

fi

done

avatar
Contributor

@SaurabhSaurabh

Yes, the script I gave was with "hadoop fs -ls" command, because many people do not understand what it does and they will simply copy the script, run it and then blame that they lost data.

The problem is most people, call themselves Hadoop admins, but have never worked as Linux system admins/engineer 🙂

avatar
Contributor

@Saurabh the script takes a argument as the number of days 🙂

So, if you want to look for files older then 10 days then #./cleaup.sh 10

avatar
Explorer