Created on 02-27-2023 02:03 AM - edited on 02-27-2023 10:54 PM by VidyaSargur
While running a long-running spark application (for example streaming application), the spark will generate a larger/huge single event log file until the Spark application is killed or stopped. Maintaining a single event log file which may cost a lot to maintain and also requires a bunch of resources to replay per each update in the Spark History Server.
To avoid creating. a single huge event log file, the spark team created a rolling event log file.
Step1: Enable the rolling event logs and set the max file size
CM -->Spark 3 --> Configuration --> Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-defaults.conf.
spark.eventLog.rolling.enabled=true
spark.eventLog.rolling.maxFileSize=128m
The default spark.eventLog.rolling.maxFileSize value will be 128MB. The minimum value is 10MB.
Step2: Set the rolling event log max files to retain
CM -->Spark 3 --> Configuration --> History Server Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-history-server.conf
spark.history.fs.eventLog.rolling.maxFilesToRetain=2
By default, spark.history.fs.eventLog.rolling.maxFilesToRetain value will be infinity meaning all event log files are retained. The minimum value is 1.
Verify the output from the Spark history server event log directory.
[root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress
-rw-rw---- 3 spark spark 10485458 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:05 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002
[root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress
-rw-rw---- 3 spark spark 492014 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_1_application_1672813574470_0002.compact
-rw-rw---- 3 spark spark 10489509 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002
-rw-rw---- 3 spark spark 227068 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002
[root@c3543-node4 ~]# sudo -u spark hdfs dfs -ls -R /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002
-rw-rw---- 3 spark spark 0 2023-01-04 07:03 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/appstatus_application_1672813574470_0002.inprogress
-rw-rw---- 3 spark spark 873356 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_2_application_1672813574470_0002.compact
-rw-rw---- 3 spark spark 10484816 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_3_application_1672813574470_0002
-rw-rw---- 3 spark spark 339165 2023-01-04 07:06 /user/spark/spark3ApplicationHistory/eventlog_v2_application_1672813574470_0002/events_4_application_1672813574470_0002