Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

How to remove all files greater than 100 files in folder or remove all files older than 10days in Hadoop shell script

avatar
New Contributor

Hello all,

 

I am new to both shell script and hadoop. 

I would like to remove files which satisfy either 1 condition in a folder. 

condition 1: remove all files more than 100 file counts in the folder.

condition 2: remove all files older than 10 days in the same folder. 

 

I have a shell script which did removing file older than 10 days only. But I don't know how to modify the script when I want to add another condition of removing files more than 100 file counts in the folder. 

 

 

#!/bin/sh
start_time=`date`
processStart=`date -d "$start_time" '+%Y-%m-%d %H:%M:%S'`
psnew=`date -d "$processStart" '+%Y-%m-%d %H:%M'`
now=`date -d "$psnew" +'%s'`
PATH_ARCH=/folder1/arch
hadoop fs -count $PATH_ARCH/*.txt
for FILE in `hdfs dfs -ls  $PATH_ARCH/File_*.txt | sort -rk6,7 | tail -n 100 | grep wav | awk '{print $8}'`; do
    filename=(${FILE//// })
    filename_split=${filename[6]}
    fileTimestamp=`hadoop fs -ls $FILE | awk '{print $6,$7}'`
    fileTimestampsec=`date -d "$fileTimestamp" +'%s'`
    time_difference=$((($now - $fileTimestampsec)/(60*60*24)))
    if [[ $time_difference -gt 10 ]]; then
       hadoop fs -rm -skipTrash $FILE
    fi
done

 

Thank you.

 

0 REPLIES 0
Labels