Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can I clean up a worker's directory automatically?

avatar
New Contributor

Hi everyone,

everytime our data comes and new updates occur in our cluster, an undesirable file is being created in all workers' directories.In order to cleanup automatically I changed the variable value Spark (Standalone) Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh in Gateway Default Group->Advanced Settings  as :

export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=60 -Dspark.worker.cleanup.appDataTtl=60"

by using cloudera manager.After i make the cluster restart, it makes change in spark/conf/spark-env.sh  but  it does not make cleanup.Does anyone know where the mistake is or another way of cleaning up automatically ?

i am using CDH 4 and Spark 1.2.2 in the cluster.

7 REPLIES 7

avatar
Super Collaborator

Specifying worker opts in the client does not really make sense. The worker needs to know what it needs to clean up and it should be set on the worker.

Try adding the whole string (that you have between the quotes to the "Additional Worker args" for the worker.

 

Wilfred

 

avatar

Hi Team,

 

i am not able to find these properties in CM-5.8.2, could you please let me know where can i see or add these properties in CM

 

Regards,

Umesh

avatar
Super Collaborator

Umesh,

 

Those settings are for Spark standalone clusters only. I would strongly advise you not to run standalone but use Spark on Yarn.

When you use Yarn the problem does not exist as Yarn handles it for you.

 

Wilfred

avatar

Hi wilfred,

 

  • we are using Yarn- client mode not standalone but Yarn dose not handle that?
  • Both client/server will generate the log in /tmp ( could you please check if we can change it to tmp/spark_tmp/ so we can have the control for the spark files only ?) 

such are files being generated in the /tmp for us

 

1) 

 

spark-cefa5a3d-bce2-45d2-9490-1ee19b9ac8b8
spark-d0806158-ece7-4d80-896b-a815e2e18e8a
spark-d0813ff3-9f4c-4fd8-8208-8596469e805e
spark-d1e55364-7207-4203-a583-df1face35096
spark-d26618a3-ba93-4d91-a5ea-b2d873735f97
spark-d49288de-a99d-4ede-af6f-fbf0a276c4e7
spark-d81273b6-f5da-4eed-a42e-913d414018cb
spark-df75486f-1f04-4838-bc07-4196172c42c8
spark-dfbbabf5-034e-47d5-9246-3edac5742558
spark-dfd16e79-6a67-4a7a-89dc-98d8c9f83df3
spark-e0783ace-5897-46a9-a073-4e4431a521f0
spark-e1429fea-160e-4d37-a349-053553a197a5

 

2) 

 

f23f0701-76d3-4449-834e-d9ce33a009c3_resources
f268f04c-38cb-4b0a-9382-3c0fa5df0486_resources
f2a04084-dd40-4bd7-9243-25ce110fe10d_resources
f2bf7f23-eb3d-490f-a55f-be4984ab6858_resources
f2c0813b-adf2-400c-9f32-4b1e8619a5ed_resources
f2cee78a-b8cc-4d24-b10e-7955064e5a94_resources
f2ee0bdf-72a2-40d1-ae97-99036b81dd3d_resources
f2f60b55-c602-4fd9-a1f9-57973abf13c3_resources
f304c39b-8801-4d82-bc83-748e0a5720a9_resources
f310055d-f431-4749-8366-d708ca1eedd8_resources
f37d51bb-14d4-4ea7-9f8b-cdb5ca4d83cc_resources
f3a60213-8d7c-4183-be97-ee5a02bb165d_resources

regards,

Umesh

avatar
Super Collaborator

Those directories should be cleaned up in current releases via either SPARK-7503 and or SPARK-7705. They are specific for yarn based setups. It should still happen automatically.

 

Wilfred

avatar

Hi Wilfred,

 

thanks for your reply, but we are using spark1.6.0 in CDH-5.8.2 and we are still seeing these dirs are not removed automatically.

 

coudl you please confirm this is known bug in CDH-5.8.2 and spark-1.6.0

 

 

 

Regards,

Umesh

avatar
Super Collaborator

No this is not a known issue as far as I know. If you have a support contract please open a support case and we can look into it further for you.

 

Wilfred