Created 03-30-2017 09:07 PM
How to do a cleanup of hdfs files older than a certain date using a bash script.
I am just looking for a general strategy.
Created 03-30-2017 09:26 PM
Below post has one example script which deletes files older than certain days:
#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		Created 03-30-2017 09:26 PM
Below post has one example script which deletes files older than certain days:
#!/bin/bash
usage="Usage: dir_diff.sh [days]"
if [!"$1"]
then
echo$usage
exit1
fi
now=$(date +%s)
hadoop fs -ls /zone_encr2/ | grep "^d" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [$difference-gt$1]; then
hadoop fs -ls `echo$f| awk '{ print $8 }'`;
fi
done
					
				
			
			
				
			
			
			
			
			
			
			
		Created on 09-28-2022 06:59 AM - edited 09-28-2022 07:05 AM
The script in the accepted solution was not working for me, so I modified it:
#!/bin/bash
usage="Usage: dir_diff.sh [path] [-gt|-lt] [days]"
if (( $# < 3 ))
  then
  echo $usage
  exit 1
fi
now=$(date +%s)
hdfs dfs -ls $1 | grep -v "^d" | grep -v '^Found ' | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
  if [ $difference $2 $3 ]; then
    echo $f
    # hdfs dfs -ls `echo $f| awk '{ print $8 }'`;
  fi
done
 
					
				
				
			
		
