Support Questions

jan_deluyck · ‎04-25-2018

I've noticed that the spark history server doesn't cleanup any (really old) .inprogress files - which makes sense in a way as it can't distinguish between what is actually running and what not.

Is there an easy way to automate this cleanup? We've got files here going back to 2016.

rajsyrus · ‎04-25-2018

@Jan De Luyck

Set these below parameters in your “Custom spark-defaults” config setting in Ambari (or your spark-env.sh) to take care of these massive logs:

spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=5d

jan_deluyck · ‎04-25-2018

@Rajendra Manjunath

we have those set, but those seem to only apply to the finished applications.

AKR · ‎08-26-2019

Hi,

This is very difficult to identify the active files now which are in progress state.

Please look for the RUNNING jobs from RM WebUI and remove all the other in progress files that are not listed in RUNNING state.

To check for the RUNNING Jobs from RM WeBUI please follow this steps

1. Login into Cloudera Manager.

2. Choose Yarn as Service

3. Click WEBUI

4. Choose Resource Manager WEBUI

5. A New Screen will be displayed showing list of all applications.

6. On the Left Hand side you can see links displayed there under applications link. Click the "Running" link displayed under Applications link

7. This "Running" link will all show the in-progress jobs that are active,.

8. Please look for the RUNNING jobs from RM WebUI and remove all the other in progress jobs that are not listed in RUNNING state.

Cloudera Community

Support Questions

Spark history server cleanup .inprogress