Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to do a cleanup of hdfs files older than a certain date using a bash script

avatar

How to do a cleanup of hdfs files older than a certain date using a bash script.

I am just looking for a general strategy.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
New Contributor

The script in the accepted solution was not working for me, so I modified it:

 

#!/bin/bash
usage="Usage: dir_diff.sh [path] [-gt|-lt] [days]"
if (( $# < 3 ))
  then
  echo $usage
  exit 1
fi
now=$(date +%s)
hdfs dfs -ls $1 | grep -v "^d" | grep -v '^Found ' | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
  if [ $difference $2 $3 ]; then
    echo $f
    # hdfs dfs -ls `echo $f| awk '{ print $8 }'`;
  fi
done