Created 02-24-2016 01:12 PM
Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.
I have gone through below one but I am looking for any shell script.
https://github.com/nmilford/clean-hadoop-tmp/blob/master/clean-hadoop-tmp
Created 08-30-2016 11:24 PM
You can do:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done
Replace the directories or files you need to clean up appropriately.
Created 09-24-2016 07:18 AM
@Gurmukh Singh: Thanks I just tested it with following ways and it is working fine. We can change hadoop fs -ls to hadoop fs -rm -r and required dir.
#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done
Created 10-11-2016 06:52 AM
@SaurabhSaurabh
Yes, the script I gave was with "hadoop fs -ls" command, because many people do not understand what it does and they will simply copy the script, run it and then blame that they lost data.
The problem is most people, call themselves Hadoop admins, but have never worked as Linux system admins/engineer 🙂
Created 09-22-2016 12:51 PM
@Saurabh the script takes a argument as the number of days 🙂
So, if you want to look for files older then 10 days then #./cleaup.sh 10
Created 04-11-2019 03:27 PM
can some help me on this as well?
https://community.hortonworks.com/questions/243908/major-compaction-failure.html