Support Questions

SK1 · ‎02-24-2016

Do we have any script which we can use to clean /tmp/hive/ dir frequently on hdfs. Because it is consuming space in TB.

I have gone through below one but I am looking for any shell script.

https://github.com/nmilford/clean-hadoop-tmp/blob/master/clean-hadoop-tmp

trainings · ‎08-30-2016

You can do:

#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done

Replace the directories or files you need to clean up appropriately.

View solution in original post

SK1 · ‎09-24-2016

@Gurmukh Singh: Thanks I just tested it with following ways and it is working fine. We can change hadoop fs -ls to hadoop fs -rm -r and required dir.

#!/bin/bash

usage="Usage: dir_diff.sh [days]"

if [!"$1"]

then

echo$usage

exit1

fi

now=$(date +%s)

hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do

dir_date=`echo $f | awk '{print $6}'`

difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

if [$difference-gt$1]; then

hadoop fs -ls `echo$f| awk '{ print $8 }'`;

fi

done

trainings · ‎10-11-2016

@SaurabhSaurabh

Yes, the script I gave was with "hadoop fs -ls" command, because many people do not understand what it does and they will simply copy the script, run it and then blame that they lost data.

The problem is most people, call themselves Hadoop admins, but have never worked as Linux system admins/engineer 🙂

trainings · ‎09-22-2016

@Saurabh the script takes a argument as the number of days 🙂

So, if you want to look for files older then 10 days then #./cleaup.sh 10

SY0C64110 · ‎04-11-2019

can some help me on this as well?

https://community.hortonworks.com/questions/243908/major-compaction-failure.html