Support Questions

Find answers, ask questions, and share your expertise

How to clean up temporary Hive folders/files in local filesystem "/tmp"?

avatar
Explorer

Hello,

 

in our local filesystem we see thousands of subdirectories like "????????-????-????-????-????????????_resources" older than 5 days (and this is for guarantee nothing from any still running process).

Additionally there are thousands of files and hundreds of subdirectories in "/tmp/hive" older than 5 days.

Where do those relicts come from?

How to get rid of them in an automated way?

 

$ find -O1 /tmp -type d -name "????????-????-????-????-????????????_resources" -mtime +5 | wc -l
26263
$ find /tmp/hive -type d -mtime +5 | wc -l
538
$ find /tmp/hive -type f -mtime +5 | wc -l
5784



 

Best Regards
Carsten

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@caisch The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen. The other reason might be if you are using beeline to run the query and if you abruptly disconnect the session without disconnecting properly by using '!q' then the file created on the '/tmp/hive' during the beeline initialisation will not be cleared.

To clean up the /tmp directory automatically add the below properties in custom-hive-site.xml

hive.start.cleanup.scratchdir - True // To clean up the Hive scratch directory while starting the HiveServer2.

hive.server2.clear.dangling.scratchdir - true //This will start a thread in Hiveserver2 to clear out the dangling directories from the HDFS location.

hive.server2.clear.dangling.scratchdir.interval - 1800s

After adding the property kindly restart the hive service. 

Reference link:

https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Scratch... 

 

or you can run a cron job to delete the files periodically.

Reference Link:

 https://community.cloudera.com/t5/Support-Questions/Do-we-have-any-script-which-we-can-use-to-clean-... 

 

Please 'Accept as Solution' if my answers are really helpful to you.

Thanks!

View solution in original post

1 REPLY 1

avatar
Expert Contributor

@caisch The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen. The other reason might be if you are using beeline to run the query and if you abruptly disconnect the session without disconnecting properly by using '!q' then the file created on the '/tmp/hive' during the beeline initialisation will not be cleared.

To clean up the /tmp directory automatically add the below properties in custom-hive-site.xml

hive.start.cleanup.scratchdir - True // To clean up the Hive scratch directory while starting the HiveServer2.

hive.server2.clear.dangling.scratchdir - true //This will start a thread in Hiveserver2 to clear out the dangling directories from the HDFS location.

hive.server2.clear.dangling.scratchdir.interval - 1800s

After adding the property kindly restart the hive service. 

Reference link:

https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Scratch... 

 

or you can run a cron job to delete the files periodically.

Reference Link:

 https://community.cloudera.com/t5/Support-Questions/Do-we-have-any-script-which-we-can-use-to-clean-... 

 

Please 'Accept as Solution' if my answers are really helpful to you.

Thanks!