Hello all,
 
I am new to both shell script and hadoop. 
I would like to remove files which satisfy either 1 condition in a folder. 
condition 1: remove all files more than 100 file counts in the folder.
condition 2: remove all files older than 10 days in the same folder. 
 
I have a shell script which did removing file older than 10 days only. But I don't know how to modify the script when I want to add another condition of removing files more than 100 file counts in the folder. 
 
 
#!/bin/sh
start_time=`date`
processStart=`date -d "$start_time" '+%Y-%m-%d %H:%M:%S'`
psnew=`date -d "$processStart" '+%Y-%m-%d %H:%M'`
now=`date -d "$psnew" +'%s'`
PATH_ARCH=/folder1/arch
hadoop fs -count $PATH_ARCH/*.txt
for FILE in `hdfs dfs -ls  $PATH_ARCH/File_*.txt | sort -rk6,7 | tail -n 100 | grep wav | awk '{print $8}'`; do
    filename=(${FILE//// })
    filename_split=${filename[6]}
    fileTimestamp=`hadoop fs -ls $FILE | awk '{print $6,$7}'`
    fileTimestampsec=`date -d "$fileTimestamp" +'%s'`
    time_difference=$((($now - $fileTimestampsec)/(60*60*24)))
    if [[ $time_difference -gt 10 ]]; then
       hadoop fs -rm -skipTrash $FILE
    fi
done
 
Thank you.