I am writing a park program, but after running it several time, disk space exhaustion from the temporary files (created by shuffled phase of Spark). I am using Spark in YARN client mode.
In spark-env.sh, I tried to put:
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=10 -Dspark.worker.cleanup.appDataTtl=10"
But the above command may be only work in stand alone mode. It has no impact on my system.
Everyone can please give me any idea about this? Thank you.
Those settings do not work on YARN so the no effect is expected. Check the Spark standalone doc.
For YARN the cleanup should be automatic and triggered by a shutdown and proper clean up of the context.
Which versin of CDH are you running and how have you configured the shuffle?
Thanks for your help. I have not used CDH. I have just used Hadoop 2.7.1 and Spark 1.5.0 to install a Spark on YARN. So far, I did not found an answered any where about it.