- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is the best way to delete temporary folder (.hive-staging) after killing the query by yarn
- Labels:
-
Apache Hive
-
Apache Tez
-
Apache YARN
Created ‎08-15-2019 04:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The below type of temporary directory is being created to store the staging files during the execution of Hive queries, more specifically while running a Tez job. When Yarn application is terminating the query manually, the folder is not being deleted.I am looking for a best solution to delete this type folders in whole cluster periodically.
.hive-staging_hive_2019-08-12_01-01-01_001_2555353356674536244134522-
Created ‎08-18-2019 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @khosrucse ,
Removal of these staging file is part of yarn application execution.
But, when the yarn application itself is killed, there is not way to remove these files.
The new query on top of the same table also does not have reference to these staging files(so that it can be removed by later runs), as the files are generated by Yarn application which is already killed now.
For now, the only option is to manually remove the files.
Though, you can refer the below link for a different approach to remove these directories. Please note, this is workaround from an end user, and thus, should be implemented with your environment in mind.
https://stackoverflow.com/questions/33844381/hive-overwrite-directory-move-process-as-distcp/3558336...
(for your reference )
