Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the best way to delete temporary folder (.hive-staging) after killing the query by yarn

Highlighted

What is the best way to delete temporary folder (.hive-staging) after killing the query by yarn

New Contributor

The below type of temporary directory is being created to store the staging files during the execution of Hive queries, more specifically while running a Tez job. When Yarn application is terminating the query manually, the folder is not being deleted.I am looking for a best solution to delete this type folders in whole cluster periodically.


.hive-staging_hive_2019-08-12_01-01-01_001_2555353356674536244134522-


1 REPLY 1

Re: What is the best way to delete temporary folder (.hive-staging) after killing the query by yarn

Hi @khosrucse ,

 

Removal of these staging file is part of yarn application execution.
But, when the yarn application itself is killed, there is not way to remove these files.

The new query on top of the same table also does not have reference to these staging files(so that it can be removed by later runs), as the files are generated by Yarn application which is already killed now.

For now, the only option is to manually remove the files.

Though, you can refer the below link for a different approach to remove these directories. Please note, this is workaround from an end user, and thus, should be implemented with your environment in mind.

https://stackoverflow.com/questions/33844381/hive-overwrite-directory-move-process-as-distcp/3558336...

(for your reference )