everytime our data comes and new updates occur in our cluster, an undesirable file is being created in all workers' directories.In order to cleanup automatically I changed the variable value Spark (Standalone) Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh in Gateway Default Group->Advanced Settings as :
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=60 -Dspark.worker.cleanup.appDataTtl=60"
by using cloudera manager.After i make the cluster restart, it makes change in spark/conf/spark-env.sh but it does not make cleanup.Does anyone know where the mistake is or another way of cleaning up automatically ?
i am using CDH 4 and Spark 1.2.2 in the cluster.
Specifying worker opts in the client does not really make sense. The worker needs to know what it needs to clean up and it should be set on the worker.
Try adding the whole string (that you have between the quotes to the "Additional Worker args" for the worker.
i am not able to find these properties in CM-5.8.2, could you please let me know where can i see or add these properties in CM
Those settings are for Spark standalone clusters only. I would strongly advise you not to run standalone but use Spark on Yarn.
When you use Yarn the problem does not exist as Yarn handles it for you.
such are files being generated in the /tmp for us
Those directories should be cleaned up in current releases via either SPARK-7503 and or SPARK-7705. They are specific for yarn based setups. It should still happen automatically.
thanks for your reply, but we are using spark1.6.0 in CDH-5.8.2 and we are still seeing these dirs are not removed automatically.
coudl you please confirm this is known bug in CDH-5.8.2 and spark-1.6.0
No this is not a known issue as far as I know. If you have a support contract please open a support case and we can look into it further for you.