Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to do a cleanup of hdfs files older than a certain date using a bash script

avatar
Not applicable

How to do a cleanup of hdfs files older than a certain date using a bash script.

I am just looking for a general strategy.

1 ACCEPTED SOLUTION

avatar

Below post has one example script which deletes files older than certain days:

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...

#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done

View solution in original post

2 REPLIES 2

avatar

Below post has one example script which deletes files older than certain days:

https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm...

#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done

avatar
New Contributor

The script in the accepted solution was not working for me, so I modified it:

 

#!/bin/bash
usage="Usage: dir_diff.sh [path] [-gt|-lt] [days]"
if (( $# < 3 ))
  then
  echo $usage
  exit 1
fi
now=$(date +%s)
hdfs dfs -ls $1 | grep -v "^d" | grep -v '^Found ' | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
  if [ $difference $2 $3 ]; then
    echo $f
    # hdfs dfs -ls `echo $f| awk '{ print $8 }'`;
  fi
done