Hello all,
I am new to both shell script and hadoop.
I would like to remove files which satisfy either 1 condition in a folder.
condition 1: remove all files more than 100 file counts in the folder.
condition 2: remove all files older than 10 days in the same folder.
I have a shell script which did removing file older than 10 days only. But I don't know how to modify the script when I want to add another condition of removing files more than 100 file counts in the folder.
#!/bin/sh
start_time=`date`
processStart=`date -d "$start_time" '+%Y-%m-%d %H:%M:%S'`
psnew=`date -d "$processStart" '+%Y-%m-%d %H:%M'`
now=`date -d "$psnew" +'%s'`
PATH_ARCH=/folder1/arch
hadoop fs -count $PATH_ARCH/*.txt
for FILE in `hdfs dfs -ls $PATH_ARCH/File_*.txt | sort -rk6,7 | tail -n 100 | grep wav | awk '{print $8}'`; do
filename=(${FILE//// })
filename_split=${filename[6]}
fileTimestamp=`hadoop fs -ls $FILE | awk '{print $6,$7}'`
fileTimestampsec=`date -d "$fileTimestamp" +'%s'`
time_difference=$((($now - $fileTimestampsec)/(60*60*24)))
if [[ $time_difference -gt 10 ]]; then
hadoop fs -rm -skipTrash $FILE
fi
done
Thank you.