Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Rising Star

The idea of this article is to help admins detect artifacts(files/folders) in the cluster that are older than certain days. Also, in certain cases, there may be empty directories that are lying in the cluster which are not used and hence contribute to the small file issue. Hence, we have the attached script which performs,

1. Identifies files older than X days.

2. Identifies folders older than X days.

3. Deletes empty folders.

Script Execution

Script name is "findAll.sh" which expects 2 parameters which is

1. Age of the artifact (file/folder) in terms of days.

2. Actual location of the artifact (file/folder) in HDFS.

Based upon the type of artifact and kind of operation you would have to choose one of the three operations.

NOTE:

1. Please make sure the user running the script has permissions to execute the command on the artifacts that is passed as parameter to script.

2. Also, running this script once may take some time based upon the size/hierarchy of the folders. But once the list is procured, you can act upon it as per need. Hence, I would recommend you to test the script in lower ENV and run it in PROD when the load on HDFS is less.

3. Please exercise caution on the folders on which you run the scripts.

Example executions:

Execution 1: To list the old folders:


[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
1
Please check the output in ./OldFolders-202054.txt ;




Execution 2: To list the old files:


[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
2
Please check the output in file ./Oldfiles.txt-202148




Execution 3 : To delete empty folders


[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
3
rmdir: `/tmp/hive/hive/_tez_session_dir': Directory is not empty


Please feel free to tweak and extend the functionalities of the script.findall.tar.gz

1,404 Views