Created on 09-14-2023 12:44 PM - last edited on 09-21-2023 07:24 AM by cjervis
from
hdfs dfs -du -h /
we see that spark history take a lot space from HDFS
I want delete only some that I specified for example all that are of the year 2019 and 2019, and keep the rest.
If I use the command
hdfs dfs -rm -R /spark2-history/*
this delete all and I want not delete all.
Thanks
Created on 09-14-2023 06:09 PM - edited 09-14-2023 06:32 PM
Hi @Emanuel_MXN
Generally not recommended to keep event logs older than some days/months. In your case, you are keeping logs for years.
To avoid keeping the old logs, please add the following parameters to the spark-defaults.conf file and delete old event logs based on your need.
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge 7d
spark.history.fs.cleaner.interval 1h
I don't have any handy script to delete files from hdfs for specific date/year. If i found definetely i will share it here.
Created on 09-14-2023 06:09 PM - edited 09-14-2023 06:32 PM
Hi @Emanuel_MXN
Generally not recommended to keep event logs older than some days/months. In your case, you are keeping logs for years.
To avoid keeping the old logs, please add the following parameters to the spark-defaults.conf file and delete old event logs based on your need.
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge 7d
spark.history.fs.cleaner.interval 1h
I don't have any handy script to delete files from hdfs for specific date/year. If i found definetely i will share it here.
Created 09-26-2023 02:42 AM
@Emanuel_MXN, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,