Support Questions

Find answers, ask questions, and share your expertise

How to do a cleanup of hdfs files older than a certain date using a bash script

avatar

How to do a cleanup of hdfs files older than a certain date using a bash script.

I am just looking for a general strategy.

1 ACCEPTED SOLUTION

avatar

Below post has one example script which deletes files older than certain days:

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...

#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done

View solution in original post

2 REPLIES 2

avatar

Below post has one example script which deletes files older than certain days:

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...

#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done

avatar
New Contributor

The script in the accepted solution was not working for me, so I modified it:

 

#!/bin/bash
usage="Usage: dir_diff.sh [path] [-gt|-lt] [days]"
if (( $# < 3 ))
  then
  echo $usage
  exit 1
fi
now=$(date +%s)
hdfs dfs -ls $1 | grep -v "^d" | grep -v '^Found ' | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
  if [ $difference $2 $3 ]; then
    echo $f
    # hdfs dfs -ls `echo $f| awk '{ print $8 }'`;
  fi
done