Created on 11-29-2017 06:13 PM - edited 08-17-2019 10:05 PM
from
hdfs dfs -du -h /
we see that spark history take a lot space from HDFS
from ambari GUI
I choose spark
and then quick links
and then I get the history server page with all applications
I want to delete all applications from the page
how to do it because I not see the delete button ?
second
is it possible to delete the application that use hdfs by API or CLI ?
Created 11-29-2017 06:41 PM
If you want to delete applications in spark2
hdfs dfs -rm -R /spark2-history/{app-id}
If you want to delete applications in spark1
hdfs dfs -rm -R /spark-history/{app-id}
Restart history servers after running the commands.
Thanks,
Aditya
Created 11-29-2017 06:35 PM
Spark has a bunch of parameter to deal with job history rotation.
In particular :
spark.history.fs.cleaner.enabled true spark.history.fs.cleaner.maxAge 12h spark.history.fs.cleaner.interval 1h
Source : https://spark.apache.org/docs/latest/monitoring.html
In the example above :
- Rotation is active
- All Jobs > than 12 hours will be deleted
- Deletion happens at 1 hour intervals
Note these parameters need to be implemented on a environnement level ( not on a job level ).
They are usually placed in spark-default file.
Matthieu
Created 11-29-2017 06:36 PM
Created 11-29-2017 06:41 PM
If you want to delete applications in spark2
hdfs dfs -rm -R /spark2-history/{app-id}
If you want to delete applications in spark1
hdfs dfs -rm -R /spark-history/{app-id}
Restart history servers after running the commands.
Thanks,
Aditya
Created 11-29-2017 06:44 PM
@Aditya thank you -
but how to delete all application that use HDFS , because in the page I see a lot of application around 1000 , so I cant delete one by one
Created 11-29-2017 06:45 PM
hdfs dfs -rm -R /spark2-history/* will remove all applications
Created on 11-29-2017 06:48 PM - edited 08-17-2019 10:04 PM
ok , so if I want to remove it from the ambari GUI then , how to do it ( I ask because from the page I not see any delete option )
Created 11-29-2017 06:52 PM
You can use files view if you want to delete from GUI. I'm not sure if there is delete option in Spark history server.
Created 11-29-2017 06:55 PM
is it possible to print by CLI all application list so I will by grep capture the hdfs and appliaction ID and then remove it by hdfs dfs -rm -R /spark2-history//{app-id}
Created 11-29-2017 07:07 PM
If you want to list all and delete all applications. You can simply do
hdfs dfs -rm -R /spark2-history/*
This folder will have only spark2 app logs and no other files. Hope this helps
(Or)
You can do the below. This should print all application IDs
curl http://{spark2history server url}:18080/api/v1/applications | grep "\"id\"" > a.txt cut -d':' -f2 a.txt | cut -d "\"" -f 2
Created 11-29-2017 07:34 PM
I try this but get errors ( what is wrong in my syntax ? ) , ( master02 is the name of park-history server )
.
curl -sH "X-Requested-By: ambari" -u admin:admin -i curl http://master02:8080/api/v1/applications
.
HTTP/1.1 404 Not Found
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Created 11-29-2017 07:45 PM
I see that you are using port 8080.
For spark history server port is 18080 by default
for spark2 history server port is 18081 by default. You can check the port in UI where you saw the applications
Created 11-29-2017 07:57 PM
ok , now I am using this:
.
curl -sH "X-Requested-By: ambari" -u "$API_USER"":""$API_PASSWORD" -i curl http://master02:18081/api/v1/applications
.
but no any output from command
what its wrong ?
Created 11-29-2017 08:03 PM
Run this command as is. No need to append headers and password. In above command you were using curl twice.
curl http://master02:18081/api/v1/applications | grep "\"id\"" > a.txt
Created 01-05-2020 05:39 AM