Support Questions

Find answers, ask questions, and share your expertise

Spark history server cleanup .inprogress

avatar
New Contributor

I've noticed that the spark history server doesn't cleanup any (really old) .inprogress files - which makes sense in a way as it can't distinguish between what is actually running and what not.

Is there an easy way to automate this cleanup? We've got files here going back to 2016.

3 REPLIES 3

avatar
Expert Contributor

@Jan De Luyck

Set these below parameters in your “Custom spark-defaults” config setting in Ambari (or your spark-env.sh) to take care of these massive logs:

spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=5d

avatar
New Contributor

@Rajendra Manjunath

we have those set, but those seem to only apply to the finished applications.

avatar
Cloudera Employee

Hi,

 

This is very difficult to identify the active files now which are in progress state. 

Please look for the RUNNING jobs from RM WebUI and remove all the other in progress files that are not listed in RUNNING state. 

 

To check for the RUNNING Jobs from RM WeBUI please follow this steps

 

1. Login into Cloudera Manager.

2. Choose Yarn as Service

3. Click WEBUI

4. Choose Resource Manager WEBUI

5. A New Screen will be displayed showing list of all applications.

6. On the Left Hand side you can see links displayed there under applications link. Click the "Running" link displayed under Applications link

7. This "Running" link will all show the in-progress jobs that are active,.

8. Please look for the RUNNING jobs from RM WebUI and remove all the other in progress jobs that are not listed in RUNNING state.