Created 03-30-2017 09:07 PM
How to do a cleanup of hdfs files older than a certain date using a bash script.
I am just looking for a general strategy.
Created 03-30-2017 09:26 PM
Below post has one example script which deletes files older than certain days:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [!"$1"] then echo$usage exit1 fi now=$(date +%s) hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [$difference-gt$1]; then hadoop fs -ls `echo$f| awk '{ print $8 }'`; fi done
Created 03-30-2017 09:26 PM
Below post has one example script which deletes files older than certain days:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [!"$1"] then echo$usage exit1 fi now=$(date +%s) hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [$difference-gt$1]; then hadoop fs -ls `echo$f| awk '{ print $8 }'`; fi done
Created on 09-28-2022 06:59 AM - edited 09-28-2022 07:05 AM
The script in the accepted solution was not working for me, so I modified it:
#!/bin/bash
usage="Usage: dir_diff.sh [path] [-gt|-lt] [days]"
if (( $# < 3 ))
then
echo $usage
exit 1
fi
now=$(date +%s)
hdfs dfs -ls $1 | grep -v "^d" | grep -v '^Found ' | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [ $difference $2 $3 ]; then
echo $f
# hdfs dfs -ls `echo $f| awk '{ print $8 }'`;
fi
done