Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Disk space exhaustion from the temporary files created by Spark in YARN Client Mode

Disk space exhaustion from the temporary files created by Spark in YARN Client Mode

Explorer

Hello all,

 

I am writing a park program, but after running it several time, disk space exhaustion from the temporary files (created by shuffled phase of Spark). I am using Spark in YARN client mode. 

 

In spark-env.sh, I tried to put:

 

export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=10 -Dspark.worker.cleanup.appDataTtl=10"

 

But the above command may be only work in stand alone mode. It has no impact on my system.

 

Everyone can please give me any idea about this? Thank you.

 

 

3 REPLIES 3

Re: Disk space exhaustion from the temporary files created by Spark in YARN Client Mode

Super Collaborator

Those settings do not work on YARN so the no effect is expected. Check the Spark standalone doc.

For YARN the cleanup should be automatic and triggered by a shutdown and proper clean up of the context.

 

Which versin of CDH are you running and how have you configured the shuffle?

 

Wilfred

Re: Disk space exhaustion from the temporary files created by Spark in YARN Client Mode

Explorer

Thanks for your help. I have not used CDH. I have just used Hadoop 2.7.1 and Spark 1.5.0 to install a Spark on YARN. So far, I did not found an answered any where about it.

 

Re: Disk space exhaustion from the temporary files created by Spark in YARN Client Mode

Master Collaborator
This forum is for CDH users. You should post questions about generic
Spark / Hadoop in their respective mailing lists.