Created 04-20-2018 01:21 PM
I'm running several Spark2 Streaming applications in a YARN cluster. I have yarn.log-aggregation-enable=true and the log files stored in HDFS grow unbounded. Before I was running this on YARN, using Spark Standalone, I used the spark.executor.logs.rolling.{interval, strategy, maxRetainedFiles} to mange log files and it worked great. I've tried all sorts of settings to keep the aggregate logs to a manageable size with no luck.
Can someone direct me to the configuration setting(s) that can help define how these aggregate logs are purged? An ideal scenario would allow me to manage them by time and size.
Thanks in advance.
Created 04-20-2018 01:32 PM
The yarn log aggregation retention can be controlled by setting yarn.log-aggregation.retain-seconds property in yarn-site.xml
For example, if you want logs older than 30 days to be deleted, you can set yarn.log-aggregation.retain-seconds to 2592000
Created 04-20-2018 01:37 PM
Thanks for the response Tarun. I've tried that setting with no luck. I currently have it set to 600 as a test, only want to see logs for the last 10 minutes, and I have logs in there from yesterday. Is there a minimum I might be missing? I know the setting has a disclaimer that says not to set it to low it it will spam the node but it does not indicate a minimum threshold.
Does this work differently since it's a long-running (streaming) application and I'm technically using the same log file the entire time? The language in the description of this setting implies it deletes the file, in reality it needs to remove lines from a file that is being written to.
Created 04-20-2018 02:02 PM
The retain-seconds will not work for an active application that is writing files. It works by checking whether the last modified timestamp for the application log dir falls older than the retain-seconds. Since your streaming job writes logs continuously, the directory timestamp will never fall within 600 seconds. So your logs are not getting deleted because of this.
Also log aggregation in yarn doesn't work the same way as setting log rolling/retention like in log4j as you are expecting.
Created 04-20-2018 02:00 PM
- Do update/configure log4j in spark application so that your executor logs gets rotated by interval/size.
- Update yarn.nodemanager.log-aggregation.debug-enabled=true & yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds=<roll interval>" in yarn and restart yarn service.
Created 04-20-2018 03:48 PM
Thanks Sandeep, I'm working on this now and will report back once I've got it all setup.
Created 05-02-2018 07:54 PM
Sandeep, thanks for the response. As suggested, I have the following configurations established for the executors:
spark.executor.logs.rolling.strategy time
spark.executor.logs.rolling.maxRetainedFiles 72
spark.executor.logs.rolling.time.interval {various settings}
I've tested both the hourly and minutely settings for the above time interval and both of those seem to work, as in they roll the executor logs as they should.
I've also set yarn.nodemanager.log-aggregation.debug-enabled=true & yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds={various settings} and restarted YARN. Sadly, I'm not seeing the aggregate logs respect any of the settings, they just continue to grow and grow.
Any other tips?
Created 05-03-2018 10:54 AM
@Andrew Mills What is the value you have set for yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds={various settings}? RM should aggregate the logs at this interval. Also assuming that your log4j is working as expected and logs are bring rolled.
Created 05-28-2018 02:25 AM
hey ,guys I also meet this problem but I found my old apps (hive and hdfs ) already finished ,can't trigger clean function to clean my old app logs,and open debug mode also can't find yarn.log-retain started or skipped logs,so any idea to trigger this?
thank you
,but I watch my hive app logs and hdfs revelant app logs and I found it didn't be cleaned by yarn and permission is ok and last modified timestamp also satisfied with retain-seconds,and I use debug mode and didn't found revelant logs show start clean or clean skipped