Created on 11-29-2017 06:13 PM - edited 08-17-2019 10:05 PM
from
hdfs dfs -du -h /
we see that spark history take a lot space from HDFS
from ambari GUI
I choose spark
and then quick links
and then I get the history server page with all applications
I want to delete all applications from the page
how to do it because I not see the delete button ?
second
is it possible to delete the application that use hdfs by API or CLI ?
Created 11-29-2017 06:41 PM
If you want to delete applications in spark2
hdfs dfs -rm -R /spark2-history/{app-id}
If you want to delete applications in spark1
hdfs dfs -rm -R /spark-history/{app-id}
Restart history servers after running the commands.
Thanks,
Aditya
Created 11-29-2017 06:35 PM
Spark has a bunch of parameter to deal with job history rotation.
In particular :
spark.history.fs.cleaner.enabled true spark.history.fs.cleaner.maxAge 12h spark.history.fs.cleaner.interval 1h
Source : https://spark.apache.org/docs/latest/monitoring.html
In the example above :
- Rotation is active
- All Jobs > than 12 hours will be deleted
- Deletion happens at 1 hour intervals
Note these parameters need to be implemented on a environnement level ( not on a job level ).
They are usually placed in spark-default file.
Matthieu
Created 11-29-2017 06:36 PM
Created 11-29-2017 06:41 PM
If you want to delete applications in spark2
hdfs dfs -rm -R /spark2-history/{app-id}
If you want to delete applications in spark1
hdfs dfs -rm -R /spark-history/{app-id}
Restart history servers after running the commands.
Thanks,
Aditya
Created 11-29-2017 06:44 PM
@Aditya thank you -
but how to delete all application that use HDFS , because in the page I see a lot of application around 1000 , so I cant delete one by one
Created 11-29-2017 06:45 PM
hdfs dfs -rm -R /spark2-history/* will remove all applications
Created on 11-29-2017 06:48 PM - edited 08-17-2019 10:04 PM
ok , so if I want to remove it from the ambari GUI then , how to do it ( I ask because from the page I not see any delete option )
Created 11-29-2017 06:52 PM
You can use files view if you want to delete from GUI. I'm not sure if there is delete option in Spark history server.
Created 11-29-2017 06:55 PM
is it possible to print by CLI all application list so I will by grep capture the hdfs and appliaction ID and then remove it by hdfs dfs -rm -R /spark2-history//{app-id}
Created 11-29-2017 07:07 PM
If you want to list all and delete all applications. You can simply do
hdfs dfs -rm -R /spark2-history/*
This folder will have only spark2 app logs and no other files. Hope this helps
(Or)
You can do the below. This should print all application IDs
curl http://{spark2history server url}:18080/api/v1/applications | grep "\"id\"" > a.txt cut -d':' -f2 a.txt | cut -d "\"" -f 2