Member since
09-28-2022
1
Post
0
Kudos Received
0
Solutions
09-28-2022
02:44 PM
Hello all, I am new to both shell script and hadoop. I would like to remove files which satisfy either 1 condition in a folder. condition 1: remove all files more than 100 file counts in the folder. condition 2: remove all files older than 10 days in the same folder. I have a shell script which did removing file older than 10 days only. But I don't know how to modify the script when I want to add another condition of removing files more than 100 file counts in the folder. #!/bin/sh
start_time=`date`
processStart=`date -d "$start_time" '+%Y-%m-%d %H:%M:%S'`
psnew=`date -d "$processStart" '+%Y-%m-%d %H:%M'`
now=`date -d "$psnew" +'%s'`
PATH_ARCH=/folder1/arch
hadoop fs -count $PATH_ARCH/*.txt
for FILE in `hdfs dfs -ls $PATH_ARCH/File_*.txt | sort -rk6,7 | tail -n 100 | grep wav | awk '{print $8}'`; do
filename=(${FILE//// })
filename_split=${filename[6]}
fileTimestamp=`hadoop fs -ls $FILE | awk '{print $6,$7}'`
fileTimestampsec=`date -d "$fileTimestamp" +'%s'`
time_difference=$((($now - $fileTimestampsec)/(60*60*24)))
if [[ $time_difference -gt 10 ]]; then
hadoop fs -rm -skipTrash $FILE
fi
done Thank you.
... View more
Labels:
- Labels:
-
Apache Hadoop