Support Questions

Find answers, ask questions, and share your expertise

Hive staging directory not getting cleaned up


In CDH 5.8.0 with spark-sql insert of data there are many .hive-staging directories getting piled up and not getting deleted or removed while the insert of data is completed successfully.


Please let me know the reason for such behaviour and how should i get away with .hive-staging directory, is there any property we need to set ?



Hi Jais,


Can you please let me know where you run your Hive query? Do you run it through Hue?


If you run through Hue, in most cases the staging directory will be left over even after query finishes. This is because Hue holds the query handler open so that users can get back to it, and the clean up of staging directories will only be triggered when query handler is closed.


So first thing I would like to check is where you run your Hive query.



New Contributor

I'm also having this issue on 5.7 while executing a spark action through Oozie. Any thoughs where to start looking ?


Can you please provide additional details on what the usecase is? are you using oozie hive1 action or hive2 action? Are these jobs failing? Please provide us a brief reproducer if you can. Thank you




We run hive queries using beeline action through oozie workflow

I just boot a demon thread scheduleAtFixRate clean these "empty" and has file "_SUCCESS" directory  and another thread to run hive cmd "alter xxx concatenate"


Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Runnable {
override def run(): Unit = {
val fs = FileSystem.get(new Configuration())
val status = fs.listStatus(new Path(s"hdfs://nameservice/user/xxx/warehouse/$tableName/"))
status.foreach(stat =>
if (stat.isDirectory && stat.getPath.getName.contains("hive-staging") && fs.getContentSummary
(stat.getPath).getSpaceConsumed < 1024) {
println("empty path : " + stat.getPath)
if (directoryHasSuccess(stat.getPath, fs)) {
fs.delete(stat.getPath, true)
val now = new Date().getTime
if (now - stat.getModificationTime > 5 * 60 * 1000 && (now - stat.getAccessTime > 5 * 60 *
1000)) {
//5m before
println("delete path " + stat.getPath)
fs.delete(stat.getPath, true)
}, 5, interval, TimeUnit.SECONDS);



New Contributor

Hi, anyone has a solid answer for this? How can we get rid of this issue? We have thousands of such folders created. 

Expert Contributor

New Contributor

Hi everybody,


I experience the same issue on CDH5.5 (Spark 1.6.0) with my Spark Streaming Job. Data is read from a Kafka broker and then inserted into an hive table, paritionning by year/month/day/hour. All the data is present into the table after the insetinto() call but 'hive-staging....' directory created during the batch is still there and empty ...


The resources are allocated by Yarn, there are no errors logs about file creation/deletion in the executors logs. I had tested a lot of settings without any success (regarding logs persistence etc.).


Micro-batch is called every 10 seconds... The job will produce a lot of useless empty directories. 

New Contributor
Still have the problem on CDH-5.7.

Have the same problem,CDH 5.4.7.After streaming job with HiveContext

New Contributor

One more post that cloudera dont give a **bleep** for solution.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.