Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Cloudera Employee

The idea of this article is to help admins detect artifacts(files/folders) in the cluster that are older than certain days. Also, in certain cases, there may be empty directories that are lying in the cluster which are not used and hence contribute to the small file issue. Hence, we have the attached script which performs,

1. Identifies files older than X days.

2. Identifies folders older than X days.

3. Deletes empty folders.

Script Execution

Script name is "findAll.sh" which expects 2 parameters which is

1. Age of the artifact (file/folder) in terms of days.

2. Actual location of the artifact (file/folder) in HDFS.

Based upon the type of artifact and kind of operation you would have to choose one of the three operations.

NOTE:

1. Please make sure the user running the script has permissions to execute the command on the artifacts that is passed as parameter to script.

2. Also, running this script once may take some time based upon the size/hierarchy of the folders. But once the list is procured, you can act upon it as per need. Hence, I would recommend you to test the script in lower ENV and run it in PROD when the load on HDFS is less.

3. Please exercise caution on the folders on which you run the scripts.

Example executions:

Execution 1: To list the old folders:


[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
1
Please check the output in ./OldFolders-202054.txt ;




Execution 2: To list the old files:


[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
2
Please check the output in file ./Oldfiles.txt-202148




Execution 3 : To delete empty folders


[hive@c2187-node2 tmp]$ ./findAll.sh 9 /tmp/hive/hive
Please select your option
1. Identify folders/directories that are older than 9 days
2. Identify files that are older than 9 days
3. Delete empty folders
3
rmdir: `/tmp/hive/hive/_tez_session_dir': Directory is not empty


Please feel free to tweak and extend the functionalities of the script.findall.tar.gz

298 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎10-29-2018 06:13 PM
Updated by:
 
Contributors
Top Kudoed Authors