Support Questions

Find answers, ask questions, and share your expertise

What is the best way to delete temporary folder (.hive-staging) after killing the query by yarn

avatar
New Contributor

The below type of temporary directory is being created to store the staging files during the execution of Hive queries, more specifically while running a Tez job. When Yarn application is terminating the query manually, the folder is not being deleted.I am looking for a best solution to delete this type folders in whole cluster periodically.


.hive-staging_hive_2019-08-12_01-01-01_001_2555353356674536244134522-


1 REPLY 1

avatar

Hi @khosrucse ,

 

Removal of these staging file is part of yarn application execution.
But, when the yarn application itself is killed, there is not way to remove these files.

The new query on top of the same table also does not have reference to these staging files(so that it can be removed by later runs), as the files are generated by Yarn application which is already killed now.

For now, the only option is to manually remove the files.

Though, you can refer the below link for a different approach to remove these directories. Please note, this is workaround from an end user, and thus, should be implemented with your environment in mind.

https://stackoverflow.com/questions/33844381/hive-overwrite-directory-move-process-as-distcp/3558336...

(for your reference )