About usan

usan · ‎09-28-2022

Hello all, I am new to both shell script and hadoop. I would like to remove files which satisfy either 1 condition in a folder. condition 1: remove all files more than 100 file counts in the folder. condition 2: remove all files older than 10 days in the same folder. I have a shell script which did removing file older than 10 days only. But I don't know how to modify the script when I want to add another condition of removing files more than 100 file counts in the folder. #!/bin/sh start_time=`date` processStart=`date -d "$start_time" '+%Y-%m-%d %H:%M:%S'` psnew=`date -d "$processStart" '+%Y-%m-%d %H:%M'` now=`date -d "$psnew" +'%s'` PATH_ARCH=/folder1/arch hadoop fs -count $PATH_ARCH/*.txt for FILE in `hdfs dfs -ls $PATH_ARCH/File_*.txt | sort -rk6,7 | tail -n 100 | grep wav | awk '{print $8}'`; do filename=(${FILE//// }) filename_split=${filename[6]} fileTimestamp=`hadoop fs -ls $FILE | awk '{print $6,$7}'` fileTimestampsec=`date -d "$fileTimestamp" +'%s'` time_difference=$((($now - $fileTimestampsec)/(60*60*24))) if [[ $time_difference -gt 10 ]]; then hadoop fs -rm -skipTrash $FILE fi done Thank you.

Online	Offline
Last Visited	‎09-28-2022 08:35 PM

Member Since	‎09-28-2022 02:32 PM
Last Visited	‎09-28-2022 08:35 PM
Posts	1

Cloudera Community

How to remove all files greater than 100 files in ...